ESTIMATION  AND  SPECIFICATION  ANALYSIS  WITH  CENSORED  PANEL  DATA 
AN  APPLICATION  TO  THE  DIVIDEND  BEHAVIOR  MODEL 
FOR  THE  U.S.  MANUFACTURING  INDUSTRY 


By 

BYEONG  SOO  KIM 


A DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN  PARTIAL  FULFILLMENT 
OF  THE  REQUIREMENTS  FOR  THE  DEGREE  OF 
DOCTOR  OF  PHILOSOPHY 

UNIVERSITY  OF  FLORIDA 


1989 


ACKNOWLEDGEMENTS 


Special  thanks  are  extended  to  my  advisor  Professor  G.S.  Maddala 
for  his  continuous  support  and  encouragement.  His  wide  knowledge  of 
economics  and  keen  insight  into  econometrics  have  provided  me  with 
intellectual  nutrition  during  my  study  in  Florida. 

I would  also  like  to  thank  Professors  Mark  Rush,  Leonard  Cheng, 
and  J.S.  Shonkwiler  for  their  kind  help  throughout  the  completion  of 
this  dissertation.  Dr.  Kim  Sawyer  gave  me  valuable  comments  to  improve 
my  work. 

It  was  a great  fortune  for  me  to  study  in  the  United  States.  The 
time  I have  spent  at  Florida  will  be  unforgettable  memory  in  my  life. 

I dedicate  this  dissertation  to  my  parents  with  a thankful  heart. 


ii 


TABLE  OF  CONTENTS 


P.age 

ACKNOWLEDGMENTS  ii 

ABSTRACT  iv 

CHAPTERS 

1 INTRODUCTION  1 

2 SPECIFICATION  AND  ESTIMATION  OF  CENSORED  PANEL  DATA  MODELS  . . 5 

Introduction  5 

Estimation  of  Censored  Data  Models  7 

Specification  and  Estimation  of  Panel  Data  Models  13 

Estimation  of  Censored  Panel  Data  Models  17 

Specification  Analysis  27 

3 REVIEW  OF  THE  LITERATURE  ON  DIVIDEND  BEHAVIOR  33 

Introduction  33 

The  Miller  and  Modigliani  Dividend  Irrelevancy 

Proposition  34 

The  Signaling  Hypothesis  in  Dividend  Theory  37 

The  Agency  Cost  Hypothesis  in  Dividend  Theory  44 

Empirical  Studies  on  Dividend  Behavior  47 

4 MODELING  AND  ESTIMATION  OF  DIVIDEND  BEHAVIOR  OF  THE 

U.S.  MANUFACTURING  INDUSTRY  60 

Introduction  and  Data  60 

Modeling  Dividend  Behavior  62 

Estimation  and  Specification  Analysis  65 

5 SUMMARY  AND  CONCLUSIONS  87 

REFERENCES  91 

BIOGRAPHICAL  SKETCH  96 


iii 


Abstract  of  Dissertation  Presented  to  the  Graduate  School 
of  the  University  of  Florida  in  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of  Doctor  of  Philosophy 

ESTIMATION  AND  SPECIFICATION  ANALYSIS  WITH  CENSORED  PANEL  DATA: 

AN  APPLICATION  TO  THE  DIVIDEND  BEHAVIOR  MODEL 
FOR  THE  U.S.  MANUFACTURING  INDUSTRY 

By 

Byeong  Soo  Kim 
December,  1989 

Chairman:  Dr.  G.  S.  Maddala 

Major  Department:  Department  of  Economics 

This  study  leads  with  two  related  subjects:  the  development  of 
specification  and  estimation  methods  for  censored  panel  data,  and 
analysis  of  the  dividend  behavior  of  the  U.S.  manufacturing  industry. 

In  the  first  half  of  this  study,  some  alternative  error  covariance 
structures  are  suggested  to  simplify  the  estimation  problems  caused  by 
the  complicated  error  covariance  matrix  that  one  encounters  when  the 
conventional  stochastic  assumptions  are  used  for  the  random  effects 
specification.  The  suggested  error  covariance  structures  are  not  based 
on  any  theory  or  prior  beliefs.  Therefore,  some  test  statistics  are 
proposed  to  examine  the  validity  of  the  specifications. 

The  second  half  of  this  study  is  an  application  of  the  suggested 
model  specification  to  a dividend  behavior  model.  The  dividend  behavior 
model  which  follows  the  framework  of  the  Lintner  model  is  extended  to 
cover  panel  data.  Annual  data  for  the  U.S.  manufacturing  industry  for 


iv 


the  period  of  1976-1987  are  obtained  from  the  Compustat  tape.  The 
pooled  regression  tobit  model  is  estimated  using  the  maximum  likelihood 
method  under  three  different  error  covariance  specifications.  From  the 
specification  tests,  one  of  the  models  with  heteroscedastic  error 
covariance  structure  turns  out  to  be  well  specified.  The  estimation 
results  of  the  well -specified  model  are  used  to  investigate  the  dividend 
behavior  of  the  U.S.  manufacturing  industry.  The  conclusions  from  the 
empirical  study  are  as  follows:  First,  the  Lintner  model  is  consistent 

with  the  data.  Second,  the  hypothesis  that  firms  are  reluctant  to  cut 
dividends  is  not  supported  for  the  manufacturing  industry  of  the  United 
States.  Third,  the  Miller  and  Modigliani  dividend  irrelevancy 
proposition  turns  out  to  be  valid.  Finally,  the  informational 
hypotheses  in  dividend  theory- -the  signaling  hypothesis  and  the  agency 
cost  hypothesis- -are  rejected. 


v 


CHAPTER  1 
INTRODUCTION 

Since  Tobin's  (1958)  suggestion  of  a statistical  model  for  the 
situation  in  which  the  range  of  the  dependent  variable  is  limited  in 
some  way,  limited  dependent  variable  models  have  been  widely  used  in 
econometric  applications.1  The  log- likelihood  function  for  the 
conventional  tobit  model  is  globally  concave  and  thus  can  be  estimated 
by  the  maximum  likelihood  method.  Moreover,  the  maximum  likelihood 
estimator  of  the  tobit  model  is  consistent  and  asymptotically  normal 
under  standard  assumptions  on  the  stochastic  errors. 

One  of  the  main  goals  of  this  study  is  to  apply  the  tobit  model  to 
censored  panel  data.  Models  which  utilize  panel  data  attempt  to  exploit 
the  error  covariance  structure  of  the  data  generating  process.  As  a 
result,  we  can  obtain  estimates  of  the  parameters  relating  to  the 
attributes  common  across  the  cross-sectional  units  in  the  panel.  The 
conventional  model  specifications  in  panel  studies  are  the  fixed  effects 
model  and  the  random  effects  model.  We  are  confronted  with  some 
problems  when  we  try  to  apply  the  tobit  model  to  the  estimation  of 
censored  panel  data  under  the  random  effects  specification.  The  lack  of 
independence  between  observations  increases  the  order  of  complexity  of 


1 The  model  is  called  "truncated"  if  both  of  the  dependent  variable 
and  independent  variables  are  missing  for  some  observations,  and 
"censored"  if  the  dependent  variable  is  missing  but  the  independent 
variables  are  observed. 


1 


2 


the  likelihood  function  considerably  and  gives  serious  difficulties  in 
estimation.  In  this  thesis,  some  alternatives  with  some  special  type  of 
error  covariance  structure  are  suggested  to  circumvent  the  problems. 

They  could  be  considered  as  heteroscedastically  adjusted  tobit  models 
since  all  the  off-diagonal  elements  of  the  error  covariance  matrix  are 
assumed  to  be  zeros  while  assuming  a special  form  of  the 
heteroscedasticity  on  the  diagonal  elements.  The  error  covariance 
structures  suggested  are  not  derived  from  economic  theory,  and  so  must 
be  subject  to  test. 

Direct  specification  tests  such  as  the  Hausman  specification  test 
are  not  applicable  to  the  model  specifications  discussed  in  this  study. 
For  the  purpose  of  evaluating  the  model  specification,  indirect  methods 
must  be  used.  The  first  one  tests  the  assumption  of  zero  error 
covariance  between  time-different  observations  of  every  cross-sectional 
unit,  which  is  critical  for  utilizing  maximum  likelihood  estimation. 

The  second  test  is  the  parameter  stability  test  suggested  by  Anderson 
(1987)  which  evaluates  the  model  specification  by  comparing  the 
predictive  ability  of  the  model.  These  two  statistics  are  used  as 
criteria  in  evaluating  the  model  specification. 

Another  objective  of  this  study  is  to  explain  the  dividend 
behavior  of  the  manufacturing  industry  in  the  United  States.  One  of  the 
puzzles  in  the  field  of  finance  is  what  determines  the  dividend 
decisions  of  firms.  The  Lintner  model,  which  explains  the  dividend 
behavior  by  the  current  earnings  variable  and  dividend  payout  of  the 
previous  period,  has  been  used  in  most  empirical  studies.  However,  are 
there  any  other  variables  which  have  significant  relationship  with 


3 


dividend  payout?  The  theoretical  obstacle  that  prevents  the  other 
variables  from  being  entered  in  the  dividend  regression  is  the  Miller- 
Modigliani  (M-M)  dividend  irrelevancy  proposition  which  is  generally 
accepted  in  the  field  of  finance.  In  an  environment  in  which  dividends 
are  taxed  and  the  capital  market  is  imperfect,  then  by  the  M-M 
proposition  no  dividends  will  be  paid.  In  the  real  economy,  however, 
dividends  are  paid  in  spite  of  their  costs.  Why  do  firms  pay  dividends? 
Recent  attempts  to  answer  this  question  apply  information  theory,  which 
embraces  the  signaling  theory  and  the  principal -agent  theory.  The  basic 
idea  is  that  dividends  have  an  informational  content.  This  can  be  used 
to  explain  the  existence  of  dividends.  This  signaling  approach  and 
agency  cost  explanation  of  dividend  behavior  can  relax  the  M-M  dividend 
irrelevancy  proposition. 

Ordinary  least  squares  (OLS)  method  has  been  the  main  method  used 
in  previous  empirical  studies  of  dividends.  However,  consistent 
estimates  cannot  be  obtained  from  OLS  if  there  are  a significant  number 
of  zero  observations  for  the  dependent  variable,  i.e.  dividend  payout. 
Data  in  which  the  dependent  variable  is  censored  at  zero  require  the 
application  of  the  tobit  model  for  estimation.  Moreover,  it  is 
necessary  to  use  panel  data  to  explain  the  dividend  behavior  of 
industrial  companies. 

We  use  the  framework  of  the  well-known  Lintner  model  of  dividend 
behavior.  The  variables  characterizing  the  signaling  hypothesis  and  the 
agency  cost  hypothesis  are  employed  in  the  model  to  evaluate  those 
hypotheses.  The  models  are  estimated  by  the  maximum  likelihood  method 
under  various  assumptions  on  error  covariance  structure.  For  the 


4 


estimation  of  heteroscedasticity  models,  the  concentrated  maximum 
likelihood  method  is  used  to  resolve  the  difficulties  stemming  from  the 
complexity  of  the  likelihood  function. 

The  structure  of  the  dissertation  is  as  follows:  In  Chapter  2, 

estimation  methods  and  specification  analysis  for  the  tobit  model  and 
those  for  the  panel  study  are  discussed  in  sections  2 and  3.  Estimation 
for  censored  panel  data  is  discussed  in  the  next  section,  and  the 
statistic  for  the  test  of  zero  covariances  between  the  errors  is  derived 
in  the  final  section  of  the  chapter. 

In  Chapter  3,  theories,  empirical  studies  and  econometric  aspects 
of  dividend  behavior  models  are  discussed. 

Chapter  4 suggests  a dividend  behavior  model.  This  model  is 
estimated  by  the  maximum  likelihood  method.  Specification  analysis  is 
then  used  to  evaluate  the  model. 

The  final  chapter  presents  the  summary  and  the  conclusions. 


CHAPTER  2 

SPECIFICATION  AND  ESTIMATION  OF  CENSORED  PANEL  DATA  MODELS 

Introduction 

Limited  dependent  variable  models  that  are  commonly  used  in 
empirical  studies  refer  to  the  econometric  models  in  which  the  dependent 
variables  are  limited  in  some  way.  The  term  "limited  dependent  variable 
models"  comprises  various  types  of  models.  Maddala  (1983)  classified 
them  into  three  categories  according  to  the  type  of  limitation  on  the 
dependent  variables.  They  are  truncated  regression  models,  censored 
regression  models,  and  dummy  endogenous  variable  models. 

The  main  interest  of  this  study  is  in  the  censored  regression 
model  which  is  usually  called  the  tobit  model.  The  classical  least 
squares  method  is  not  appropriate  to  analyze  censored  data  since  the 
resulting  estimates  are  biased.  Therefore,  other  techniqus  such  as 
maximum  likelihood  method  must  be  used  to  obtain  consistent  estimators. 
Recently,  various  types  of  tobit  models  and  their  estimation  methods 
have  been  proposed.1  In  the  next  section  of  the  chapter,  the  structure 
and  the  estimation  methods  are  discussed  in  the  context  of  the  standard 
tobit  model. 

Recent  availability  of  good  panel  data- -cross  sections  of 
individuals  over  time- -allow  us  to  construct  more  realistic  models. 

Amemiya  (1984)  classified  the  diverse  tobit  models  into  five  types 
according  to  the  form  of  the  likelihood  function  and  discussed  various 
estimation  methods. 


5 


6 


They  help  to  explain  the  behavior  that  could  not  be  explained  using  only 
a single  cross-section  at  a point  in  time  or  time-series  of  a single 
observation.  Another  advantage  of  panel  data  sets  is  that  their  sample 
size  is  usually  large,  thus  increasing  the  degrees  of  freedom.  This  may 
improve  the  efficiency  of  econometric  estimates.  Moreover,  it  also  may 
resolve  the  problem  of  missing  or  unobserved  variables. 

The  conventional  single  equation  model  specification  for  panel 
data  is  written  as  follows: 


y.  - a.+ 

J It  1 


V 


BX.  + u . , 

it  it 


(2.1) 


where  is  the  individual  specific  term,  7 is  the  time  specific  term 
and  u is  the  stochastic  error  term  which  is  assumed  to  be 
independently  identically  distributed  with  mean  zero  and  variance  a . 
There  could  be  several  model  specifications  according  to  the  treatment 
of  each  parameter.  In  this  study,  the  time-specific  term  is  omitted  for 
the  simplicity.  The  slope  coefficient  B is  assumed  to  be  constant  over 
individuals  and  time.  The  interest  of  this  study  is  the  treatment  of 
the  individual  specific  terms  a. . 

Two  commonly  used  specifications  of  panel  data  are  the  "fixed 
effects  model"  and  "the  random  effects  model".  Clearly  whether  to  treat 
the  individual  specific  terms  as  fixed  constants  or  as  random  variables 
influences  the  estimation  results  as  well  as  the  estimation  procedure. 

A potentially  rich  area  of  empirical  research  is  that  of  censored 
panel  data.  However,  we  encounter  difficulties  when  we  extend  the 
estimation  techniques,  which  are  appropriate  for  estimating  standard 


7 


tobit  model  or  continuous  panel  data  model,  to  the  estimation  of 
censored  panel  data. 

Under  the  fixed  effects  assumptions,  the  extension  of  the  tobit 
model  to  the  censored  panel  data  is  somewhat  straightforward.  On  the 
other  hand,  the  random  effects  specification  makes  the  error  covariance 
structure  of  the  model  more  complicated,  and  sometimes  makes  it 
impossible  to  estimate  the  model  by  conventional  estimation  techniqus 
such  as  maximum  likelihood.  Alternative  error  covariance  structures, 
which  simpify  the  problems  encountered  in  the  estimation  procedure,  will 
be  suggested  later.  The  suggested  specifications  will  be  as  follows: 

All  the  off-diagonal  elements  are  assumed  to  be  zero,  and  the  diagonal 
elements  are  assumed  to  have  some  special  form  of  heteroscedasticity . 

The  suggested  error  covariance  structures  are  essentially  arbitrary. 
Therefore,  some  criteria  for  the  evaluation  of  the  suggested  error 
covariance  specification  are  also  proposed. 

Estimation  of  Censored  Data  Models 

Censored  regression  models  in  which  the  range  of  the  dependent 
variable  is  constrained  in  some  way  have  been  developed  since  Tobin 
(1958)  suggested  an  estimation  method  to  analyze  household  expenditures 
on  durable  goods . This  expenditure  cannot  be  negative , so  the  data  are 
truncated  below  zero.  Tobit  models  refer  to  the  censored  or  truncated 
regression  models.  The  standard  tobit  model  is  written  as 

y.  - X.fl  + u.  if  RHS  > 0, 

J l l l ’ 

= 0 


otherwise , 


(2.2) 


8 


where  B is  a kxl  vector  of  unknown  parameters;  is  a lxk  vector  of 

explanatory  variables  and  u^'s  are  variables  which  are  assumed  to  be 

independently  and  normally  distributed  with  mean  zero  and  a common 
2 

variance  a . 

Least  squares  estimation  using  all  the  observations  leads  to 
biased  estimates,  and  least  squares  regression  using  only  the  positive 
observations  also  yields  biased  estimates.2  Therefore,  other  methods 
must  be  applied  to  get  unbiased  and/or  consistent  estimators.  The  most 
commonly  used  estimation  methods  are  the  maximum  likelihood  (ML)  method 
and  the  two-stage  estimation  method  suggested  by  Amemiya  (1973)  and 
developed  by  Heckman  (1979) . The  idea  of  the  two-stage  method  can  be 
captured  from  the  equation  for  conditional  expectation. 

E(yilyi>0)  = x.fl  + E(Ui>  -XJ}) 

= X.i ) + a(^./$i),  (2.3) 

where  0 and  are  the  standard  normal  density  function  and  standard 
normal  distribution  function  evaluated  at  X^St/a.  The  term  <^/$>^,  known 
as  the  inverse  Mill's  ratio,  is  a function  of  fi /a,  and  the  estimated 
value  can  be  obtained  from  the  probit  regression.3  The  estimated  Mill's 
ratio  is  used  as  an  additional  explanatory  variable  to  get  consistent 


2 

The  downward  biased  estimator  from  the  regression  using  all  the 
observations  including  zeros  is  clear,  but  the  direction  and  magnitude 
of  the  bias  from  the  regression  using  only  positive  observations  are  not 
clear  without  further  assumptions. 

In  the  probit  model,  we  can  estimate  only  P/a,  not  P and  a 
separately,  by  iteration  method.  The  likelihood  function  for  probit 
model  is  known  to  be  well  behaved,  and  the  iterative  procedure  will 
converge  to  a maximum  no  matter  what  the  starting  value  is. 


9 


estimates  of  fl  and  a by  OLS  using  only  the  positive  observations  on  the 
dependent  variable  at  the  second  stage.  The  two -stage  method  has  a 
computational  advantage  over  the  ML  method  because  it  needs  iteration 
only  for  the  probit  likelihood  function,  which  is  well  behaved  and 
guarantees  convergence.  Therefore,  it  will  be  useful  for  more  complex 
tobit  models  where  the  global  maximum  cannot  be  guaranteed.  However, 
for  the  standard  tobit  model,  the  computational  advantage  is  not  so 
great  since  the  standard  tobit  likelihood  function  also  has  a global 
maximum  point.  Olsen  (1978)  proved  the  global  concavity  of  the  function 
using  the  reparametrized  tobit  model  by  showing  that  the  matrix  of 
second  derivatives  is  negative  semi-definite . A The  ML  estimator  is  the 
most  efficient  estimator. 

The  tobit  ML  method  maximizes  the  likelihood  function.  This  is 
obtained  from  the  probability  density  function  for  y.=  0 and  the 
conditional  density  function  for  yi>  0.  The  log- likelihood  function  for 
the  standard  tobit  model  is  written  as 

InL  = 2 ln(l-$  ) - 2 ln(27ro2)  - 2 (y.-  X.fi)2,  (2.4) 

0 1 1 2a2  1 1 

where  the  summation  2Q  is  over  the  NQ  observations  for  which  y.=0,  the 
summation  2^  is  over  the  observations  for  which  y.>0,  and  $.  is  the 
standard  normal  distribution  function  for  X.I i/a. 


4 The  reparametrized  tobit  model  which  is 
hyt  = X./J0  + v±  if  RHS  > 0, 

= 0 otherwise, 

where  h=l/a,  Pq=@/o  and  v^=u^/a. 


10 


The  first-order  conditions  for  the  maximization  are 


3 InL 
311 


£ 

0 


f .X. 


l l 


l-$. 

l 


+ 


\ 2 (y  ix  §)*  ±-  o. 

a 1 


31nL 
do 2 


2 

2a  0 


X.Bf. 

l l 

l-3>. 


N„ 


2a 


A ? <vxis>2 

2a  1 


(2.5a) 

(2.5b) 


The  term  f^  denotes  <f>^/o,  where  <f> ^ is  a standard  normal  density  function 
evaluated  at  X^fi/a. 

The  normal  equations  are  nonlinear  and  must  be  solved  iteratively. 

For  the  iterative  computation  of  MLE , several  algorithms  have  been 

suggested.  The  well-known  Newton-Raphson  method,  which  uses  the  matrix 

of  second  derivatives  for  calculating  the  Hessian  matrix,  has  a weakness 

in  that  it  is  difficult  to  insure  convergence.  The  negative 

definiteness  of  the  Hessian  matrix  should  be  guaranteed  and  the  stepsize 

should  be  sometimes  adjusted.5  The  scoring  method  uses  the  probability 

2 

limits  of  the  second  derivatives.  That  is,  E(3  InL /ddd6')  is 
2 Is* 

substituted  for  3 In ydddd'  in  defining  the  Hessian  matrix.  The 
advantage  of  the  scoring  method  is  that  the  second  derivatives  of  the 
log- likelihood  function  need  not  be  calculated  since  - 

E ( 3 InL/ 80)  (d ln/3 0 ' ) is  used  instead  of  E(3  lnL/3030 ' ) ; hence  the  scoring 
method  is  better  for  maximizing  more  complicated  likelihood  functions 
rather  than  the  standard  tobit  likelihood  function. 

The  resulting  estimates  are  shown  to  be  consistent  and 
asymptotically  normal  under  standard  conditions  by  Amemiya  (1973). 


5 Examples  of  remedies  for  the  weakness  of  Newton-Raphson  method 
are  quadratic  hill-climbing  method,  DFP  algorithm  and  BHHH  algorithm. 


11 


However,  what  if  one  of  the  standard  conditions- -normality , no  serial 
correlation,  homoscedasticity- - is  violated?  The  properties  of  the  tobit 
MLE  under  non-normality  assumption  are  discussed  in  Goldberger  (1980) 
and  Arabmazar  and  Schmidt  (1981);  serial  correlation  in  tobit  model  is 
investigated  by  Robinson  (1982);  and  problems  with  heteroscedasticity  in 
tobit  model  are  studied  by  Maddala  and  Nelson  (1975) , Hurd  (1979)  and 
Arabmazar  and  Schmidt  (1981). 

The  tobit  MLE  is  generally  inconsistent  when  the  true  distribution 
is  non-normal.  Tests  for  normality  in  the  standard  tobit  model  are 
proposed  by  Nelson  (1981)  and  Bera,  Jarque  and  Lee  (1982).  Nelson's 
test  is  an  application  of  the  Hausman  specification  test,  and  therefore 
it  can  be  interpreted  as  a general  misspecif ication  test  rather  than 
normality  test.  The  Bera,  Jarque  and  Lee's  test  is  an  application  of 
the  score  test  which  examines  the  null  hypothesis  of  normality  against 
the  alternative  hypothesis  of  two-parameter  Pearson  family 
distributions . 

If  the  tests  reject  normality,  there  are  two  choices.  One  is  to 
devise  methods  of  estimation  under  non-normal  distributions.  An 
example  of  this  is  the  least  absolute  deviations  (LAD)  estimator 
suggested  by  Powell  (1981,  1983),  which  is  consistent  under  general  non- 
normal distributions.  The  other  is  to  transform  non-normal 
distributions  to  normal  distribution  using  transformation  schemes  such 
as  Box-Cox  transformation  (see  Maddala,  1983,  p.  190-192). 

In  the  classical  regression  model,  the  OLS  estimator  is  still 
consistent  under  serial  correlation  even  though  it  is  not  efficient.  A 
similar  property  holds  in  the  tobit  model:  The  tobit  MLE  is  not 


12 


efficient  but  is  still  consistent.  Therefore,  problems  with  serial 
correlation  are  not  as  serious  as  those  with  other  non-standard 
conditions . 

With  heteroscedasticity , which  is  the  one  of  the  interests  in  this 
study,  the  tobit  MLE  is  not  consistent.  Hurd  (1979)  shows  that  the 
parameters  are  misestimated  by  a substantial  amount  when  the  usual 
maximum  likelihood  method  is  used  for  the  truncated  regression  model 
with  heteroscedasticity.  The  usual  solution  to  heteroscedasticity 
problem  is  to  make  some  reasonable  assumptions  about  the  nature  of  the 

heteroscedasticity.  An  example  in  the  linear  regression  model  is  to 

2 

assume  that  a is  proportional  to  some  or  all  of  the  regressors.  Fishe 
et  al.  (1979)  apply  the  following  specification  to  the  tobit  model: 

°\  = ( 7 + 5 Z^)2 , (2.6) 

where  includes  some  or  all  of  the  regressors.  The  estimation  results 
from  the  simple  tobit  and  heteroscedastic  tobit  show  that  the 
differences  in  the  coefficient  estimates  between  the  simple  tobit  and 
heteroscedastic  tobit  are  more  pronounced  than  in  the  case  of  the  linear 
model.  Therefore,  ignoring  heteroscedasticity  results  in 
underestimation  or  overestimation  of  the  coefficients  unlike  the  usual 
regression  model.  The  implication  of  their  study  is  that  it  is  useful 
to  make  some  reasonable  assumptions  about  the  nature  of 
heteroscedasticity  if  heteroscedasticity  is  present  or  expected  in  the 


tobit  model. 


13 


Specification  and  Estimation  of  Panel  Data  Models 
Two  common  statistical  model  specifications  which  are  used  to 
analyze  pooled  cross-section  and  time-series  data  are  the  fixed  effects 
model  and  the  random  effects  model.  Consider  the  simple  model  which 
does  not  contain  a time  specific  component. 


it 


a.  + BX.  + u. 

1 it  it 


i = 1,2, . . . , N , 
t = 1,2 T, 


where  is  the  individual  specific  intercept  and  u^t  is  a random  error 

term  which  is  independently  identically  distributed  with  mean  zero  and 
2 

covariance  a . 

The  fixed  effects  model  treats  a as  fixed  constants  over  time 
whereas  the  random  effects  model  considers  a.  as  random  variables  iust 

l J 

like  uit>  Whether  to  treat  as  fixed  or  random  is  not  an  easy 
question  to  answer.  However,  it  is  important  to  try  to  specify  the 
proper  model  to  analyze  the  data  because  the  results  from  the  two 
different  model  specifications  can  be  significantly  different  (see 
Hausman,  1978,  and  Hsiao,  1986,  p.  41).  We  discuss  a specification  test 
for  this  later 


The  fixed  effect  model  can  be  estimated  using  least -squares- dummy - 
variable  (LSDV)  method.6  The  LSDV  estimator  of  B is  unbiased  and 
consistent  when  N or  T tends  to  infinity.  However,  the  estimator  for  a 

i 

is  consistent  only  when  T goes  infinity.  The  LSDV  method  regresses  the 
model  given  above  with  an  additional  dummy  variable: 


6 LSDV  estimator  is 
estimator . 


called  covariance 


estimator  or  within- group 


14 


yit 


N 

2 


D. 

J 


• « ■ 
it  J 


+ flX 


it 


Uit’ 


where  D . . = 1 

J .it 

- 0 


if  j“i . 

otherwise . 


(2.7) 


Under  the  usual  stochastic  assumptions  on  the  error  term  u.  . the 

it 

OLS  estimator  of  the  regression  is  Best  Linear  Unbiased  Estimator  (BLUE) 
and  it  is  expressed  as 


B 


-1, 


LSDV 


[ 2 2(Xit-  X.)(Xit-  X.)  ] "[  S S(Xlt-  X.)(y.t-  y.)  ] 


N T 


N T 


The  random  effects  model  which  treats  as  random  variables  can 


be  written  as  follows: 


y.  = BX.  + v.  , 
Jit  it  it’ 


where  v,  = a.+  u.  . 

it  l it 


Under  the  conventional  stochastic  assumptions  such  as 


a - iid(0,  a ), 
l a 


"it  - lld(0-  V' 


cov(a . , a . ) = a 
i J a 


if  i“j , 

otherwise , 


cov(u_,  u.  ) = a 
it’  js  u 


if  i=j  and  t=s, 
otherwise , 


cov(a. , u.  ) = 0 
i Jt" 


for  all  i , j ,t, 


(2.8a) 

(2.8b) 

(2.8c) 

(2 . 8d) 
(2 . 8e) 


15 


we  have  the  following  error  covariance  structure  of  the  model: 


2 2 

cov(v. _ , v.  ) = a + a 
it  js  a u 

2 

= a 

a 


= 0 


for  i=j  and  t=s , 
for  i=j  and  t/s , 
otherwise . 


(2.9) 


The  model  must  be  estimated  by  the  generalized  least  squares  (GLS) 
method  because  the  residuals  are  correlated.  The  GLS  estimator  of  the 
random  effect  model  is 


B - ( W + 0B  )'  ( W + 6B  ), 

GLS  xx  xx  xy  xy 


(2.10) 


where  W = T B 

xx  xx  xx 


W - T - B , 

xy  xy  xy 


N 

T = S X'.X.  , 
xx  . . 11 

1=1 


T =2  X.y . , 
xy  , , iJ  i 


i-1 


1 N 

B = — S ( x!ee'x.)  , 

xx  T . ' l i' ’ 

i=l 


1 

B - -sr-  E ( x!ee'y.)  , 

xy  T . - l J i ' ’ 
J 1=1 


2 2 

( a + T a ) 
u a 


The  terms  W and  B refer  to  within  groups  and  between  groups 
respectively.  The  vector  e is  a Txl  vector  with  all  elements  unity.  It 
is  easy  to  see  that  the  LSDV  estimator  can  be  expressed  as 

B . = W"1  W . 

LSDV  xx  xy 


16 


The  existence  of  an  individual  specific  effect  can  be  tested  by 

the  conventional  F-test  based  on  the  restricted  and  unrestricted 

residual  sums  of  squares  from  the  LSDV  estimation.  However,  a 

o 

convenient  alternative  to  test  the  null  hypothesis  a - 0 is  the  Lagrange 
multiplier  (LM)  statistic  developed  by  Breusch  and  Pagan  (1980).  Under 
the  null  hypothesis, 


LM 


NT 

2(T-1) 


E.(2  u.  )' 
iv  t ity 

2 

£.£  uT\ 
l t it 


(2.11) 


2 

is  asymptotically  distributed  as  x (1)  where  u is  the  OLS  residuals 
from  the  classical  model  without  the  individual  specific  effects.  The 
LM- statistic  can  be  used  to  test  the  random  effects  model  against  a 
classical  regression  model  without  the  individual  specific  effects.  To 
test  the  null  hypothesis  of  a random  effects  model  against  an 
alternative  hypothesis  of  a fixed  effects  model,  the  Hausman 
specification  test  can  be  applied  directly  (see  Hausman,  1978,  and 
Maddala,  1988,  for  details).  The  Hausman  specification  test  statistic 
is  obtained  from  the  difference  of  two  sets  of  estimates;  one  is 
consistent  and  efficient  under  the  null  hypothesis  but  inconsistent 
under  the  alternative  hypothesis;  the  other  is  consistent  under  both  the 
null  and  alternative  hypotheses  but  not  efficient  under  the  null 
hypothesis.  The  GLS  estimator  and  LSDV  estimator  can  be  used  to  obtain 
the  Hausman  test  statistic.  The  GLS  estimator  from  the  random  effects 
model  is  both  consistent  and  efficient  if  the  null  hypothesis  of  random 
effects  is  correct.  On  the  other  hand,  the  LSDV  estimator  is  consistent 


17 


regardless  of  whether  the  null  hypothesis  is  correct  or  not,  since  all 
the  time -invariant  effects  are  canceled.  We  can  obtain  the  test 
statistic  from  the  difference  of  the  two  estimators. 


A A A _ 1 A 

m “ q'  [ V(q)  ] q, 


(2.12) 


where  5 - B^  - B^, 

- V<  \SDV>  • v<  SGLS>- 
2 

The  statistic  m has  a x -distribution  with  k degrees  of  freedom  where  k 
is  the  dimension  of  fi.  The  null  hypothesis  of  the  random  effects  model 
is  rejected  in  favor  of  the  fixed  effects  model  if  m is  sufficiently 
large . 


Estimation  of  Censored  Panel  Data  Models 
In  the  previous  section,  least  squares  estimation  methods  for 
panel  data  have  been  discussed.  The  least  squares  method,  however, 
cannot  be  extended  directly  to  censored  panel  data  where  the  dependent 
variable  is  restricted  in  some  sense.  Generally,  the  least  squares 
method  applied  to  tobit-type  censored  regression  models  leads  to  biased 
estimates;  thus  we  have  to  use  other  methods  such  as  the  maximum 
likelihood  method. 

The  tobit  model  can  be  directly  extended  to  panel  data  under 
simple  error  covariance  structure  of  fixed  effects  specification  (see 
Heckman  and  MaCurdy,  1980).  However,  difficulties  arise  when  we  assume 
random  effects  for  individual  specific  terms  since  the  conventional 
assumptions  on  the  error  covariance  structure  show  the  interdependencies 


18 


between  observations  across  time.  Other  error  covariance  structures 
which  avoid  such  interdependencies  and  the  difficulties  engendered  by 
the  correlations  between  the  lagged  dependent  variable  and  the  current 
error  term  which  are  present  in  dynamic  structure  models  must  be  used. 

The  standard  tobit  model  applied  to  panel  data  can  be  written  as 
follows : 


= a.  + x. 

1 it 


“ + ‘if 


* 

if  y. 
■'it 


* 

if  y. 
■'it 


^ 0, 
< 0. 


(2.13) 


Since  the  error  term  t is  assumed  to  be  distributed  identically  and 
independently,  the  log- likelihood  function  for  fixed  effects 
specification  can  be  obtained  easily. 


Let  d.  = 1 
it 


= 0 


if  ylt  a 0, 


otherwise . 


Then  the  log-likelihood  function  for  the  fixed  effects  tobit  model  is 


InL  = S E(l-d.  )ln$ 
• _ it' 
l t 


-a. -x.  B i 
l it 


+ E E d. 
i t 


it 


1 i 2 

-x— lna 

2 € 


' ^2  <yit"“i'xit6)2  1 
€ 


(2.14) 


Unlike  the  case  of  the  linear  model,  it  is  not  possible  to  devise 


estimators  of  li  and  a,  which  are  not  functions  of  the  fixed  effects  a 

i ' 


19 


When  the  time-series  observations  per  cross-section  unit  are  fixed  with 

small  numbers,  it  is  not  possible  to  consistently  estimate  the  fixed 

effects.  Moreover,  this  inconsistency  carries  through  the  estimates  of 
2 

B and  a (see  Maddala,  1988,  p.  325).  This  problem  is  a kind  of 
"incidental  parameters"  problem,  first  discussed  by  Neyman  and  Scott 
(1948).  Heckman  (1979)  performs  a Monte  Carlo  study  of  the  multivariate 
probit  model  with  fixed  effects  and  T=8.  He  reports  that  the  maximum 
likelihood  fixed  effects  estimator  behave  well  in  the  sense  that  the 
estimated  parameter  values  are  very  close  to  the  true  parameter  values. 
Though  no  Monte  Carlo  study  was  done  for  the  fixed  effects  tobit  model, 
one  can  presume  that  the  same  result  would  carry  through  for  the  tobit 
model.  Heckman  and  MaCurdy  argue  that  the  inconsistency  problem  of  the 
estimation  is  practically  unimportant  based  on  the  results  of  Heckman's 
Monte  Carlo  study. 

Another  problem,  encountered  in  practical  estimation  is  the 

identification  of  the  fixed  effects,  q^.  If  cht=0  for  all  t,  then  the 

corresponding  estimate  of  is  infinite,  thus  not  identified.  Heckman 

and  MaCurdy  discard  those  cross-sectional  units  from  the  estimation. 

However,  that  solution  to  the  identification  problem  is  not  desirable 

because  it  generates  a selectivity  bias  problem. 

For  a panel  data  model  with  a short  time  span,  Heckman  and  MaCurdy 

(1980)  suggest  an  iterative  maximum  likelihood  estimation  method.  Their 

method  can  be  understood  as  a two-step  procedure.  The  fixed  effects 

parameter  a^'s,  which  may  be  different  across  cross-section  units,  are 

2 9 

estimated  given  values  for  fi  and  a . The  parameters  B and  a , which  are 
common  for  all  observations,  are  then  estimated  given  estimated  value 


20 


for  q^'s.  These  two  steps  are  iterated  until  the  estimates  for  both  the 

parameters  and  the  fixed  effects  converge.7 

When  we  treat  random  effects,  it  is  not  as  simple  as  the  fixed 

2 

effects  model  to  estimate  parameters  B and  a since  the  covariances  of 
the  error  terms  between  different  time  periods  of  each  individual  are 
not  zero  under  the  conventional  assumptions  about  the  error  term  and 
individual  specific  terms.  The  random  effects  tobit  model  is  written  as 
follows : 


y . = X.  fi  + u. 
■'it  it  it 


where  u. 


it  V ‘if 


yit-  yit 


if  yit  2 O' 


- o 


otherwise . 


(2.15) 


The  conventional  assumptions  on  and  e lead  to  the  error  covariance 
structure  of  the  model. 


Cov(uit , uJs) 


2 2 

a + a 
a € 


— a 


= 0 


for  i=j  and  t=s , 
for  i=j  and  t^s , 
otherwise . 


(2.16) 


That  is,  all  the  off-diagonal  terms  of  the  error  covariance  matrix  are 
not  zero  because  the  off-diagonal  elements  of  the  submatrices  for  each 
cross-sectional  unit  are  not  zero.  The  problem  with  serial  correlation 


y 

Heckman  and  MaCurdy  (1980)  perform  a Monte  Carlo  study  with  the 
multivariable  probit  model  with  fixed  effects  and  a short  time  period 
(T=8).  They  report  that  even  though  the  estimates  are  inconsistent, 
from  the  practical  point  of  view  this  might  not  be  a serious  problem  if 
there  are  no  lagged  dependent  variables. 


21 


between  time -periods  is  more  serious  than  that  of  the  dependence  between 
cross  sections.  Therefore,  we  need  alternative  assumptions  on  error 
covariance  structure  to  facilitate  the  estimation  of  the  random  effects 
model.  Given  the  appropriate  specification  of  the  fixed  effects,  a 
convenient  general  alternative  which  eliminates  the  above  problem  is  to 
assume  the  off-diagonal  elements  of  error  covariance  matrix  are  zero  and 
to  accommodate  the  dependence  between  cross-sections  into  the  diagonal 
elements  of  the  error  covariance  matrix.  There  is  no  theoretical 
economic  reason  for  an  error  covariance  structure  where  all  the  off- 
diagonal  terms  are  zero.  Therefore,  the  assumptions  on  the  error 

4 

covariance  structure  are  essentially  arbitrary  and  several  different 
schemes  could  be  used. 

In  the  panel  study  where  the  stochastic  error  term  is  determined 

by  both  the  cross-sectional  unit  and  the  time  period  and  both  the  cross- 

sectional  difference  and  time  specific  difference  are  being  studied,  it 

would  be  reasonable  to  consider  cross-sectional  and  time  periodic 

characteristics  to  specify  the  stochastic  assumptions.  Therefore,  a 

combination  of  two  stochastic  error  terms  is  used  to  specify  the  error 

2 

covariance  structure  of  the  model:  a^,  which  is  common  for  each  cross 

section,  represents  the  distribution  of  errors  over  the  sample  periods 

2 

in  each  cross  section,  and  6^,  which  is  common  in  each  period, 
represents  the  distribution  of  errors  over  the  cross  sections  in  each 
period.  Although  various  combinations  of  these  two  variances  are 

g 

In  the  case  of  serial  dependence,  the  probability  function  or 
joint  density  functions  have  several  integrals  and  they  are 
computationally  intractable.  See  Maddala  and  Nelson  (1975)  for 
details . 


more 


22 


possible,  we  assume  that  the  variance  of  the  error  is  characterized 

2 2 

by  the  arithmetic  or  geometric  mean  value  of  a ^ and  8 The  assumptions 
about  the  stochastic  error  terms  can  be  written  as 


u.t~  iid(0 , (o\+82t)/ 2) 


(2.17a) 


or 


2 2 4 

u.t-  iid(  0 , ( 


(2.17b) 


Both  error  covariance  structures  show  that  the  off-diagonal  elements  of 
the  error  covariance  matrix  are  zero  and  the  diagonal  elements  are 
different  for  each  individual  and  period.  Therefore,  this  error 
covariance  structures  can  be  said  to  have  "heteroscedastic  adjustments" 
because  the  non- zero  error  covariances  are  adjusted  to  be  zero  and  this 
leads  us  to  assume  heteroscedasticity . 

The  log- likelihood  function  to  be  maximized  and  the  first-order 
conditions  for  the  tobit  model  with  the  error  covariance  structure  in 
(2.17a)  are  as  follows: 


i t 


(2.18) 


SlnL 

dS> 


= £ £(l-d.  ) 
• ' it 


i t 


i t 


2 


(yit'XitB)Xit=  °’  (2.18a) 


23 


3 InL 


da . 
1 


= E(l-d.  ) 
t lfc 


r f.  x.  i 

it  it 

1 E d 

i 

2 2 h 

L a^it)(V<>  J 

2 t dit 

L J 

2 d. 
t 1C 


2 2 2 


- 0, 


1,2 N, 


(2.18b) 


3 InL 

30 


E(l-d.  ) 

i 1C 


f.  X. 
It  It 


2 2 h 

“-•it  )(V<) 


-2dit 

1 


2 2 


X d. 

it 

l 


r (y.  -X.fi) 
ylt  it 

2 2 2 

<W 


12  T 

j.  , , . . . 1 1 , 


(2.18c) 


where  $.  = $ 
it 


Xitfl 


2 2 h 

<V*t> 


and  f.  = 6 
it 


X.  fi 
it 


2 2 h 

<V*  t> 


2 2 h 

<V't> 


The  log- likelihood  function  to  be  maximized  and  the  first-order 
conditions  for  the  tobit  model  with  the  error  covariance  structure  given 
in  (2.17b)  are  as  follows: 


InL  - E S(l-d.  )ln(l-«)  + E E d. 

• - it7  . it 

it  it 


■f  ^Vt)” 


rni  ^ithtV2 
2(aiV 


(2.19) 


- 2 S“-dit> 

l t 


r fitXit 
l-$. 

it 


+ 2 2 d.  -5-5  (y.  -X.  fi)X,  - 0,  (2.19a) 


. . it  2.2  17 it  it  ' it 

it  a .6 
l t 


31  nL 
3c 


f.  X. 


InL  . it  it  1 „ , r 1 

— - ~2  ■ s dn[  ~1 

a.  t L it  J 4<j . t L 4 a.  3 


24 


+ E d. 
t lfc 


r (rit-xitB)' 

4 o3.e 
1 t 


i - 1,2 N, 


(2.19b) 


31nL  „n  . 
— = S(l-d  ) 

86  i 


f.  X. 
it  it 

l-$> 

it 


+ S d. 


it 


r (yjt-xitfl)‘ 

403ct. 

t 1 


o, 


t-  1,2 T, 


(2.19c) 


where  $.  = $ 
it 


X.  B i 

it 


2 2 h 

(W 


and  f.  = d> 
it 


X.  Ji 
it 

2 2 h 


2 2 h 


2 

Both  likelihood  functions  are  the  functions  of  parameters  B,  aT 

2 

and  6^.  Hence,  they  have  k+N+T  unknown  parameters  to  be  estimated  using 
NT  observations.  In  the  standard  pooled  cross-section  and  time-series 
model  where  the  number  of  parameters  to  be  estimated  is  fixed,  the 
requirement  for  consistent  estimation  is  that  the  number  of  cross- 
sections  or  time  periods  tend  to  infinity.9  In  the  model  suggested 
here,  the  number  of  parameters  is  increasing  with  the  data  set.  In  this 
case,  the  consistency  of  estimator  requires  that  both  the  cross  sections 
and  the  time  periods  go  to  infinity  jointly  as  shown  below. 


lim 

k+N+T 

whereas  lim 

k+N+T 

» 0, 

/ 0 

N->°o 

T-*°o 

NT 

N-i-oo 
°r  T-ko 

NT 

9 The  consistency  of  estimators  in  the  standard  pooling  model  is 
well  documented  in  Hsiao  (1986)  or  Anderson  and  Hsiao  (1982). 


25 


The  assumption  that  number  of  parameters  to  be  estimated  increases 
at  a rate  of  smaller  order  than  that  of  sample  size  insures  the 
consistency  of  the  estimation.  However,  the  consistency  cannot  be 
guaranteed  in  the  practical  estimation  of  the  model  because  of  the 
relatively  larger  number  of  parameters  to  be  estimated  than  usual.  Some 
parameters  which  are  confined  to  a small  number  of  observations  may  not 
be  estimated  consistently,  and  even  the  estimation  of  the  system 
parameters  is  going  to  be  troublesome.10 

The  alternative  is  to  use  the  concentrated  likelihood  function  to 
minimize  the  number  of  parameters  to  be  estimated  by  iterative 
procedures.  The  concentrated  maximum  likelihood  estimation  method  first 
estimates  a subset  of  the  parameters.  Then  it  maximizes  the  remaining 
parameters  after  substituting  the  value  of  the  pre-estimated  parameters. 
In  this  study,  the  estimated  values  for  the  variance  parameters  are 
substituted  into  the  likelihood  function  and  fi  is  estimated.  The 
formula  for  this  should  be  obtained  from  the  first  order  conditions  for 
each  likelihood  function.  However,  the  formula  cannot  be  obtained 
because  of  complexity  of  the  equations.  Instead,  the  formula  which  can 
be  calculated  in  the  standard  tobit  model  is  used  as  an  approximation. 
The  first-order  condition  for  the  standard  tobit  model  is  as  follows: 


31nL 

afi 


r f.X. 


E 

0 L 


1 l 

l-$. 

l 


+ — 2s  (Yi’  XtB)X  = 0, 
a 1 


(2.20a) 


10  O 

In  the  context  of  the  model  discussed  in  this  study,  o.'s  are 

2 1 

related  to  only  T observations  and  6 ' s are  concerned  with  only  N 

observations.  Therefore,  if  one  of  N or  T is  small  , the  resulting 
estimates  including  /3  would  not  be  consistent. 


26 


31nL 

2 _ 2 
do  2 a 0 


, r f.X.fl  -i  N,  . 

rr  l -TT?:  - 77 + 74 1 <*l-  x 


fl)  - 0. 


(2.20b) 


Premultiply  by  Ji/2a  the  equation  (2.20a)  and  add  the  result  to  the 

2 

equation  (2.20b).  We  then  get  the  equation  for  a such  as 


a2  - 4-2  (y.-X.fl)y., 


N 


1 1 


1 1 


(2.21) 


where  is  summing  over  the  observations  with  y^>  0 and  is  the 

number  of  the  observations  with  non-zero  dependent  variable.  This 

formula  can  be  used  as  an  approximation  for  the  variance  characterizing 

the  error  distributions  of  the  time  series  for  each  cross  section  and 

for  the  variance  representing  the  error  distributions  of  the  crosss- 

sectional  units  given  a certain  period.  That  is,  the  error  variance  of 

2 

the  time  series  for  cross  section  i,  denoted  by  a ^ , and  the  error 

2 

variance  of  the  cross-sectional  units  for  the  period  t,  denoted  by  8 
can  be  estimated  as 


A2 
a . 
1 


2 d.  (y.  -X.  B)y, 


i-1 N, 


1 1 


(2.22a) 


;2 

t 


1 1 


2 d (y  -X.  fl)y. 
it  •'it  it  Jx 


t=l  T 
» • • • » A } 


(2.22b) 


where  d is  a dummy  variable  such  that  d.  =1  if  y.  >0  and  d =0  if 
it  it  J it  it 

yit=0.  These  estimated  variances,  which  are  the  functions  of  Ji,  are 
substituted  into  the  log- likelihood  function  to  construct  the 
concentrated  log- likelihood  function. 


lnL(fl) 


lnL(fl,  oj(fl). 


^(B) C(B)) 


(2.23) 


27 


The  concentrated  log- likelihood  function  is  the  function  of  ii  and  it  is 
maximized  with  respect  to  IJ  only.  The  first  order  condition  which  can 
be  obtained  using  the  chain  rule  for  derivatives  is  as  follows: 


The  formula  for  the  terms  shown  in  the  equation  (2.24)  can  be  obtained 
from  the  first-order  conditions  of  each  model  specification.  In 
maximizing  the  concentrated  log-likelihood  function,  the  dimension  of 
the  gradient  vector  is  kxl  instead  of  (k+N+T)xl  and  the  dimension  of  the 
Hessian  matrix  is  kxk  instead  of  (k+N+T)x(k+N+T) . The  advantages  of  the 
using  concentrated  maximum  likelihood  method  are  two  fold:  First,  we 

save  computing  time  since  the  calculation  of  inverse  is  more  simple. 
Second,  and  more  critical,  the  standard  maximum  likelihood  method  may 
not  reach  the  maximum  if  there  are  too  many  variance  parameters  to  be 
estimated,  because  the  log-likelihood  function  and/or  the  normal 
equations  are  very  sensitive  to  the  values  of  variance  parameters. 
Sometimes  the  convergence  is  not  possible. 


Specification  tests  for  modeling  standard  panel  data  include 
Breusch  and  Pagan's  LM  test,  which  tests  the  existence  of  individual 
specific  effects,  and  the  Hausman  test,  which  is  used  for  general 
misspecif ication.  The  Hausman  test  can  be  extended  to  censored  panel 
data  if  we  can  find  the  appropriate  estimators.  Nelson  (1981)  derives  a 


31nL  _ 31nL 
3fl  = 3B  + ' 


i t 


(2.24) 


Specification  Analysis 


28 


test  statistic  using  the  ML  estimator  and  method  of  moment  (MOM) 
estimator.11  However,  his  statistic  is  not  proper  for  testing  fixed 
effects  versus  random  effects  since  it  is  a specification  test  against 
the  general  alternative  hypothesis  of  misspecif ication.  Breusch  and 
Pagan's  LM  test  examines  the  existence  of  individual  specific  effects. 
That  is,  it  tests  var(a)=0.  Unfortunately,  Breusch  and  Pagan's  test 
cannot  be  directly  extended  to  the  case  of  censored  panel  data. 

Conventional  stochastic  assumptions  for  the  individual  specific 

2 

term  and  error  term  lead  to  E(u.  u.  )=ct  for  t^s  as  shown  in  equation 

it  is  a ' M 

(2.16).  This  has  made  estimation  of  censored  panel  data  intractable  and 
presents  difficulties  when  the  model  specification  is  dynamic.  The 
latter  problem  is  critical  since  presumably  one  of  the  attractions  of 
pooling  models  over  simple  cross-section  models  is  that  they  facilitate 
the  incorporation  of  some  form  of  dynamic  behavior  into  the  model . 
Therefore,  the  assumption  of  zero  error  covariance  between  observations 
between  different  time  periods  for  each  individual  is  very  critical  and 
should  be  tested. 

The  test  statistic  for  the  hypothesis  of  E(u.^u^s)=0,  t/s  i-s 
derived  from  the  formula  for  the  truncated  normal  distribution. 12  The 
mean  of  the  truncated  normal  distribution  can  be  expressed  as  the 
conditional  mean  of  non-zero  observations  which  is  given  by 


11  MOM  estimators,  which  are  obtained  from  the  sample  moments,  are 
consistent  but  not  asymptotically  efficient.  Therefore,  it  can  be  used 
for  Hausman  specification  test  with  MLE  which  is  asymptotically 
efficient  but  inconsistent  under  misspecification. 

12 

For  the  formula  of  truncated  normal  distribution,  see  the 
appendix  in  Maddala  (1983) . 


29 


E 

c 


<uit>  - E<uitl  “it*  -xit6) 


-Vi 

°it  -I 


1 - $ 


-X.  fl  -i 
— 

°it  -I 


M.  . 

it 


(2.25) 


Under  the  null  hypothesis,  a consistent  estimator  of  conditional  on 

u.  >-X.  fi  is  obtained  from 


Uit 


Y. 

it 


Xitfi’ 


A 

where  B is  the  MLE  of  B.  Then  a consistent  estimator  of  Cov(u.  ,u.  ) is 

it  is 

obtained  as 


C.  = (u.  - M.  J(u.  - M.  ). 
Its  It  It  is  IS 


(2.26) 


We  can  derive  the  asymptotic  variance  of  Ch  as 


Var<cits>  - Ec[(“it-Mit)(ulS-Mls)  • Ec1(uit-Mit><“is-Mis))l 


We  can  show  that  E [ (u  -M  ) (u.  -M.  )]=0  using  the  independence 

C 1C  1C  IS  IS 

condition  implicit  in  the  null  hypothesis.  The  fact  that  u.  and  u 

it  i 

are  independent  implies  that 


E (u.  u.  ) - E (u  )E  (u.  ) 
c it  is  c it  c is 


it 


is 


it 


M.  . 

is 


1 


IS 


30 


Therefore,  we  have  the  following  result: 


E [ (u  - M.  ) (u.  - M.  )] 

cl  it  it  ' is  is  J 


= E [u.  u.  - M u.  - M.  u + M.  M.  ] 
c it  is  it  is  is  it  it  is 


= E (u . u . ) - M.  E (u.  ) - M.  E (u.  ) + M.  M. 

C It  is  It  C IS  IS  C It  It  IS 


= 2M.  M.  - M.  M.  - M.  M. 
it  is  it  is  is  it 

= 0. 


Hence,  we  get  the  result: 


V (C.  ) - E [(u.  - M .) (u.  - M.  ) 

C Its  C It  It  is  is 


= E [(u.  - M.J2]E  [(u.  - M.  )2] 
cl  it  it  1 cL  is  is  J 


V (u  )V  (u.  ). 

C It  C IS 


Xitfi 

Since  V (u  ) = 1 - M_(M.i_+  ),  we  can  obtain  the  estimate  as 

c it  it  it  a. 

it 

follows  (refer  to  Maddala,  1983,  p.  365): 


A X B X.  B 

V,.  = [1  - M.  (M.  + -2^-  )]  [1  - M.  (M.  + — — ). 

its  it  it  a,.  1 1 is  is  a. 


(2.26) 


it 


is 


Now,  we  have  the  statistic  which  is  asymptotically  distributed  N(0,1): 


l 


A A U 

— £ [C.  /(V.  ) 1 , 

k.  1 its  ' its7  1 
l 


(2.27) 


where  the  summation  will  be  over  all  available  covariance  terms  for  the 
set  of  non- limit  observations  and  represents  the  number  of  covariance 


31 


13 

terms.  The  test  statistic  for  the  hypothesis  of  E(u.  u.  )=0  for  t^s 

J it  is 

is  obtained  from  summing  the  standard  normal  statistics  of  the  formula 
(2.27).  The  resulting  statistic  is 

N 2 2 * 

S *7  - X (N  ),  (2.28) 

i-1 

•J* 

where  the  degree  of  freedom  N represents  the  number  of  cross  sections 

on  which  is  defined;  that  is,  the  number  of  cross  sections  which  have 

at  least  two  non-zero  observations. 

Another  test  that  can  be  used  as  a specification  test  is  a 

parameter  stability  test  like  the  Chow  test.  The  Chow  test  has  some 

power  against  a wide  range  of  possible  alternatives.  Thus,  it  may 

provide  a convenient  specification  test.  Anderson  (1987)  developed  the 
2 

X statistic  for  testing  predictions  in  limited  dependent  variable 
models.  The  statistic  is  obtained  from  the  log- likelihood  for  the  full 
sample  and  a log- likelihood  calculated  from  a subset  of  the  sample. 
Usually,  prediction  tests  are  considered  in  the  context  of  the  time- 
series  models  rather  than  the  cross-sectional  studies  because  of  the 
nature  of  the  data.  Limited  dependent  variable  models  are  more 
frequently  cross-sectional  in  nature,  and  this  kind  of  prediction  test 
seems  inappropriate  with  limited  dependent  variable  models.  However, 
structural  change  may  not  only  be  a function  of  the  passing  time,  and 
the  prediction  tests  for  cross-section  models  is  suggested  and 
demonstrated  to  have  power  by  Anderson  (1987).  The  test  is  an  analogue 


13  That  is, 
individual  i , k^ 


if  n is  the  number  of  non- limit  observations  for 
=n(n-l)/2 


32 


of  the  Chow  test  where  only  the  maximum  values  of  the  likelihood 
functions  over  two  sample  periods;  one  is  the  full  sample  and  the  other 
is  a subsample.  While  the  statistic  can  be  applied  simply  to  logit  and 
probit  models,  it  is  not  so  straightforward  to  apply  it  to  tobit  models. 
The  following  approximation  has  been  suggested.14 


r n +n 

— -V 


nl+n2 


ln<s^;2)  ■ lnLNi 


(2.29) 


where  denotes  the  number  of  individuals  in  the  first  sub-sample,  n^ 

and  n ^ are  the  numbers  of  non- zero  observations  in  the  observations  1 to 

N^T  and  N^T+1  to  NT  respectively,  and  L denotes  the  value  of  the  log- 

likelihood  function  evaluated  over  the  two  samples.  The  statistic  is 

2 

asymptotically  distributed  as  * with  NT-N^  degree  of  freedom.  This 
prediction  test  can  be  used  for  testing  parameter  stability,  and  the 
model  which  is  correctly  specified  will  show  stability  when  we  compare 
the  estimated  parameters  of  the  sub-sample  and  the  full  sample  with 
those  from  the  misspecified  model.  Therefore,  the  test  may  be  used  to 
evaluate  different  model  specifications  such  as  fixed  effects 
specification  versus  random  effects  specification  or  homoscedasticity 
versus  heteroscedasticity . 


14  The  statistic  for  the  censored  regression  model  is  an 
approximation  to  the  true  value  since  the  formula  for  the  true  value  is 
very  complicated  and  the  asymptotic  property  shows  no  trouble  for 
approximation.  The  exact  formula  can  be  found  in  Anderson  (1987  p 
260). 


CHAPTER  3 

REVIEW  OF  THE  LITERATURE  ON  DIVIDEND  BEHAVIOR 
Introduction 

Firms'  dividend  policy  is  one  of  the  puzzles  in  the  field  of 
finance.  Many  issues  about  dividend  policy  can  be  integrated  into  the 
following  three  fundamental  questions. 

1.  What  determines  the  dividend  decisions  of  firms? 

2.  Why  do  firms  pay  dividends  in  spite  of  the  costs? 

3.  Is  the  common  stock  price  affected  by  the  firm's  dividend 
policy? 

Miller  and  Modigliani  (1961)  proved  a dividend  irrelevancy 
proposition  that  the  value  of  the  firm  is  unaffected  by  the  dividend 
policy  of  the  firm  in  a world  without  taxes  or  transactions  cost.1 
Capital  structure  theory  shows  that,  in  a world  without  taxes,  agency 
costs  or  information  asymmetry,  repackaging  the  firm's  net  operating 
cash  flows  into  fixed  cash  flows  for  debt  and  residual  cash  flows  for 
shareholders  has  no  effect  on  the  value  of  the  firm.  Thus  in  the 
absence  of  taxes,  agency  costs,  or  information  asymmetry,  dividend 
policy  is  irrelevant  and  does  not  affect  shareholders'  wealth. 


1 Miller  and  Scholes  (1978)  show  that  the  M-M  proposition  may 
survive  even  if  there  is  differential  taxation  of  dividends  and  capital 
gains . That  is , the  tax  on  ordinary  personal  income  is  greater  than  the 
capital  gains  tax. 


33 


34 


Much  theoretical  and  empirical  effort  has  been  devoted  to 
resolving  the  Miller  and  Modigliani  (M-M)  dividend  irrelevancy 
proposition  and  to  answering  the  questions  on  dividend  policy.  This 
work  can  be  categorized  into  three  parts  according  to  which  condition  of 
dividend  irrelevancy  proposition-absence  of  taxes,  absence  of  agency 
costs,  and  absence  of  asymmetric  information- is  relaxed.  Early  studies 
have  forcused  on  the  tax  structure,  while  recent  studies  are  concered 
with  information  structure  and  agency  costs.  The  signaling  hypothesis 
investigates  the  transfer  of  asymmetric  information  and  can  be  applied 
to  the  stock  market  to  explain  dividend  policy.  The  agency  cost  model 
for  dividend  behavior  applies  the  principal -agent  model  to  manager- 
stockholder  relationship. 

Three  topics  which  are  related  to  empirical  studies  on  dividend 
behavior  are  discussed  in  this  chapter.  They  are:  1)  modeling  the 
dividend  behavior;  2)  testing  the  signaling  hypothesis;  3)  investigating 
the  effect  of  dividend  policy  on  common  stock  price. 

Miller  and  Modigliani  Dividend  Irrelevancy  Proposition 

Miller  and  Modigliani's  (1961)  dividend  irrelevancy  proposition  is 

summarized  in  the  following  quotation: 

It  must  follow  that  the  current  valuation  is  unaffected  by 
differences  in  dividend  payments  in  any  future  period  and  thus 
dividend  policy  is  irrelevant  for  the  determination  of  market 
prices,  given  investment  policy.  ( Miller  and  Modigliani,  1961,  p. 
429  ) 

The  key  to  the  M-M  argument  is  that  investment  decisions  are  completely 
independent  of  dividend  policy.  That  is,  the  firm  is  able  to  pay  any 
level  of  dividends  it  wishes  without  affecting  investment  decisions.  If 


35 


dividends  and  desired  investment  outlays  use  more  cash  flow  than  is 
provided  from  operations,  the  firm  should  seek  external  financing  for 
the  source  of  funding.  The  desire  to  maintain  a certain  level  of 
dividends  need  not  ever  affect  the  investment  decision. 

Miller  and  Modigliani  applied  fundamental  valuation  principle  in  a 
multiperiod  model.  This  principle  says  that  the  price  of  each  share 
must  be  such  that  the  rate  of  return  on  every  share  will  be  the  same 
throughout  the  market  over  any  given  interval  of  time.  If  we  let 

d^(t)  = dividend  per  share  paid  by  firm  i during  period  t, 
p^(t)  = the  price  of  a share  in  firm  i at  the  start  of  period  t, 


it  follows  that  the  market  rate  of  return  during  the  period  t,  denoted 
by  p(t),  is  independent  of  i and  expressed  as 


p(  t) 


^(t)  + p^t+1)  - p^t) 
P^t) 


Equivalently, 


P^t) 


di(t)  + p^t+1) 
1 + Pit) 


(3.1) 


If  we  let  n^(t)  be  the  number  of  shares  outstanding  at  the  start  of 
period  t,  the  value  of  firm  is  written  as 

vi(t)  - nt(t)  pt(t) 

D.(t)  + n^t)  pi(t+l) 

= 1 + p(t)  ’ (3-2> 


where  D^(t)  — n^(t)d^(t)  — the  total  of  dividends  during  period  t. 


36 


Under  the  assumption  of  an  all-equity  firm,  we  have  the  following 
identity  for  the  sources  and  the  uses  of  fund. 

Xi(t)  + mi(t+l)pi(t+l)  - Ii(t)  + Di(t),  (3.3) 

where  X^(t)  - firm's  total  net  profit  for  period  t, 

I^(t)  = firm's  investment  or  increase  in  its  holding  of 
physical  assets  in  period  t, 

nu(t)  = the  number  of  new  shares  sold  at  the  ex-dividend 
price  p^(t+l) , so  that  n^(t+l)  - n^(t)  + nu(t+l). 


Using  the  identity  (3.3)  and  the  equation  n^t)  = n^t+1)  - m^t+l), 
the  numerator  of  the  valuation  equation  can  be  written  as 

Di(t)  + ni(t)pi(t+l)  - D£(t)  + ni(t+l)pi(t+l)  - m. (t+l)p  (t+1) 

- D.(t)  + V. (t+1)  - I.(t)  + X.(t)  - D. (t) 

- X.(t)  - I.(t)  + V.(t+1). 

Therefore,  the  valuation  equation  is  written  as 


V.(t) 


X.(t)  - I.(t)  + V.(t+1) 

1 + p(t) 


(3.4) 


We  can  see  obviously  that  the  present  values  of  firms  are  independent  of 
dividend  payout  because  dividend  terms  do  not  appear  in  the  valuation 
equation. 

The  M-M  argument  can  be  extended  to  include  the  corporate  taxes 
and  growth.  But,  what  happens  if  we  consider  a more  realistic  world 
where  personal  income  taxes  are  levied  and  the  personal  tax  rate  is 


37 


greater  than  the  corporate  tax  rate?  Farrar  and  Selwyn  (1967) , Brennan 
(1970),  and  Miller  and  Scholes  (1978,  1982)  consider  the  dividend  policy 
effects  in  a world  with  corporate  and  personal  taxes.  Farrar  and  Selwyn 
use  partial  equilibrium  analysis  under  the  assumption  that  shareholders 
attempt  to  maximize  their  after-tax  income,  while  Brennan  uses  general 
equilibrium  analysis  with  the  assumption  that  shareholders  try  to 
maximize  their  expected  utility  of  wealth.  The  conclusions  of  two 
analyses  are  not  much  different.  They  suggest  that  it  would  be  optimal 
to  pay  no  dividends  at  all  because  of  the  tax  disadvantage  of  dividend 
payout  over  capital  gains  when  ordinary  income  taxe  rates  are  greater 
than  capital  gain  taxe  rates.  The  implication  of  their  conclusion  is 
that  the  M-M  dividend  irrelevancy  proposition  is  not  correct  for  a world 
with  personal  income  taxes  because  paying  dividends  affects  the  value  of 
the  firm.  On  the  other  hand,  Miller  and  Scholes  (1978)  show  that  even 
if  the  personal  income  tax  is  greater  than  the  capital  gains  tax, 
shareholders  will  be  indifferent  between  payments  in  the  form  of 
dividends  or  capital  gains  from  the  firm's  share  repurchase.  Thus,  they 
argue  that  the  firm's  value  is  unrelated  to  its  dividend  policy. 

The  theoretical  validity  of  the  M-M  dividend  irrelevancy 
proposition  is  still  in  question.  Empirically,  different  results  have 
been  reported.  Some  of  empirical  studies  are  discussed  in  later  section 
of  this  chapter. 

The  Signalins  Hypothesis  in  Dividend  Theory 

Given  the  M-M  dividend  irrelevancy  proposition,  the  value  of  the 
firm  is  not  affected  by  the  dividend  policy  of  the  firm.  Moreover, 


38 


dividends  should  not  be  paid  in  a world  where  dividends  have  a tax 
disadvantage.  Then,  why  do  firms  pay  dividends  despite  the  cost? 

An  application  of  the  signaling  approach  to  dividend  theory 
attempts  to  explain  the  reason  for  paying  dividends.  Signaling  is  one 
of  the  topics  in  the  field  of  economics  of  information,  which 
investigates  the  mechanisms  of  a signal  as  a proxy  variable  in  order  to 
reduce  market  uncertainty  stemming  from  asymmetry  of  information. 2 For 
a signal  to  be  effective  in  the  market  with  asymmetric  information, 
there  must  be  a negative  correlation  between  the  signaling  cost  and  the 
quality  which  is  identified  by  the  signal.  In  stock  markets,  dividend 
payouts  may  be  used  as  a signal  for  the  true  value  of  the  firm.  First, 
there  is  an  asymmetry  of  information  in  the  stock  market.  The  managers 
of  the  firm  have  superior  information  on  the  value  of  the  firm  than  the 
outside  shareholders.  Second,  paying  dividends  involves  costs  since  it 
is  more  costly  to  use  external  funds  rather  than  internal  funds.  Third, 
there  exists  a negative  correlation  between  dividend  costs  and  the 
quality  of  the  firm  in  the  sense  that  the  cost  of  paying  dividends  for 
the  firms  with  the  greater  possibility  of  bankruptcy  are  a greater 
burden  compared  to  firms  with  prosperous  future. 

The  idea  of  considering  the  dividend  payout  as  a signal,  which 
conveys  information  about  the  firm's  true  value,  stems  from  the 
information  contents  of  dividends  (ICD)  hypothesis  suggested  by  Miller 
and  Modigliani  (1961). 


2 

The  concept  of  signaling  was  first  studied  in  the  context  of  job 
and  product  markets  by  Akerlof  (1970),  and  was  developed  into  an 
equilibrium  theory  by  Spence  (1973)  and  Riley  (1975). 


39 


That  is,  where  a firm  has  adopted  a policy  of  dividend 
stabilization  with  a long-established  and  generally  appreciated 
"target  payout  ratio",  investors  are  likely  to  (and  have  good 
reason  to)  interpret  a change  in  the  dividend  rate  as  a change  in 
management's  views  of  future  profit  prospects  for  the  firm 
(Miller  and  Modigliani,  1961,  p.  130  ) 

Since  the  ICD  hypothesis  is  not  derived  from  a well-specified  model, 

dividend  signaling  models  can  be  interpreted  as  the  formal  counterpart 

of  the  ICD  hypothesis.  The  advantage  of  developing  a well-specified 

signaling  model  is  that  the  resulting  equilibrium  relationship  can  be 

used  to  produce  testable  hypotheses.  The  first  signaling  model  using 

dividend  as  a signal  for  a firm's  true  value  is  by  Bhattacharya  (1979). 

Under  the  set  of  assumptions  and  the  specific  signaling  cost  function, 

the  objective  function  of  current  shareholders  and  their  agents  is 

written  as  follows:3 


where  E = firm  value  at  time  0, 

7 = risk-free  market  rate  of  interest, 

V — expected  liquidation  value  of  the  firm  at  time  1, 

D = dividend  paid  at  time  1 

X = uncertain  cash  flow  at  time  1 which  is  assumed  to  be 
distributed  over  (X,X) , 

B(X-D)  = penalty  to  those  firms  unable  to  pay  D,  i.e.  X-D  < 0. 


3 

For  the  assumptions  for  the  dividend  signaling  model,  see 
Bhattacharya  (1979)  or  Eades  (1982).  The  model  in  this  dissertation 
follows  the  framework  of  Eades  which  simplifies  Bhattacharya' s model  by 
assuming  a single  period  rather  than  multiperiod  framework  and 
neglecting  the  case  when  X > D.  Another  critical  difference  is  that  X 
is  assumed  to  have  a normal  distribution  instead  of  a uniform 
distribution. 


Max  E(D)  = 
D 


1+7 


1 


(3.5) 


40 


The  first  term  of  the  objective  function  is  the  value  function,  which 
represents  the  benefit  from  signaling,  while  the  second  term  represents 
the  expected  cost  of  signaling.  The  first  order  condition  will  be 


ff-T^  <V'(D)  • BF(D))  -0, 

and  we  get  the  condition  v' (D)  = BF(D) , which  implies  that  the  marginal 

benefits  and  the  marginal  costs  for  signaling  are  equated.  The  second 

order  condition  is  the  Spence-type  existence  condition  which,  for  this 

model,  requires  that  the  positive  marginal  signaling  cost  should  be 

decreasing  with  respect  to  the  true  determinant  of  firm  value.  Under 

the  assumption  of  the  normal  distribution  of  X with  mean  M and  variance 
2 

a , the  mean  value  of  cash  flow  (M)  can  be  considered  as  the  determinant 
of  true  firm  value.  Therefore,  if  the  next  condition  is  satisfied, 


3BF(D) 

3M 


= - Bf(D) 


then  the  second  order  condition  will  be  met,  and  there  may  exist  a 
signaling  equilibrium  D where  E(D)  is  maximized. 

The  signaling  models  suggested  by  Ross  (1977),  Bhattacharya 
(1979),  and  Eades  (1982)  are  the  direct  applications  of  the  signaling 
model  developed  by  Spence  (1973)  to  financial  markets.  They  demonstrate 
the  possibility  of  signaling  equilibrium  in  financial  markets  when  the 
condition  of  the  marginal  cost  of  signaling  is  positive  and  decreasing 
with  respect  to  the  true  determinant  of  the  value  is  satisfied.  The 
results  indicate  that  the  equilibrium  dividend  level  is  positively 
influenced  by  the  changes  in  the  firm's  true  value  and  negatively 


41 


related  to  the  firm's  risk.  Kalay  (1980)  applies  the  signaling 
incentive  model  of  Ross  to  the  dividend  decision  to  show  that  the 
dividend  signaling  equilibrium  cannot  exist  without  an  assumption  of  the 
reluctance  to  cut  dividends/  Therefore,  following  Kalay,  evidence 
supporting  the  reluctance  to  cut  dividends  suggests  the  existence  of 
signaling  equilibrium  where  dividends  are  used  as  a signal,  and  the 
existence  of  equilibrium  implies  that  the  dividends  have  the  potential 
to  convey  information.4  5 

A well-designed  financial  signaling  model,  suggested  by  Miller  and 
Rock  (1985),  explicitly  combines  dividends  and  external  financing.  They 
introduce  the  more  plausible  assumption  of  asymmetric  information  that 
the  managers  know  more  than  outside  shareholders  about  the  true  state  of 
the  firm's  current  earnings.6 

Every  firm  is  subject  to  the  sources  and  the  uses  of  fund 
constraint  as  follows:7 


4 Bhattacharya  (1979)  assumes  an  asymmetric  cost  of  dividend 
changes  such  that  dividend  reductions  are  more  costly  than  dividend 
increase.  This  assumption  can  be  interpreted  as  a reluctance  to  cut 
dividend. 

5 This  is  only  a "potential"  because  the  condition  of  managerial 
reluctance  to  cut  dividends  is  only  a necessary  and  not  sufficient 
condition  for  dividends  to  convey  information. 

6 The  standard  finance  model  of  optimal  investment,  financing,  and 
dividend  decisions  for  the  firm  assumes  symmetric  information. 

In  Miller  and  Modigliani  (1961),  the  term  B which  represents  new 
bond  issue  (debt)  does  not  appear  because  the  firm  is  assumed  to  be  an 
all-equity  firm.  On  the  other  hand,  Miller  and  Rock  (1985)  assumes  the 
perfect  market  and  full  information.  Therefore,  it  does  not  matter 
whether  the  external  financing  is  via  new  share  issues  or  bond  issues. 
In  the  original  paper  by  Miller  and  Rock,  the  term  B represents  any 
additional  funds  raised. 


42 


Xi  + m^P^  + B^  = Ii  + , (3.6) 

where  X,  m,  and  I are  defined  as  in  (3.3),  and  B denotes  the  new  bond 
issue.  The  condition  can  be  rearranged  as 


- I„  = D„  - B, 


- m1P1 


- B 


1’ 


(3.7) 


where  the  left-hand  side  is  the  net  cash  flow  and  the  right-hand  side 
represents  the  net  dividend.  Given  that  the  future  profit  is  the 
function  of  current  investment,  the  evolution  of  the  firm's  profits 
stream,  X^ , can  be  described  by  the  following  equations: 


where 


Using 

after 


X1  = F<V  + €r 

x2  - f(Ii)  + <2 

= F(X^  + + ®i  " D^)  + e2> 


and  e2  are  random  errors  such  that8 


E0(el)  = E0(£2)  = °’ 


(3.8) 


(3.9) 


E( e2  I €l)  “ 'yei- 

the  equations  (3.7)~(3.9),  the  cum- dividend  value  of  the  shares 
the  dividend  announcement  is  expressed  as  follows: 


V,  = 


D1  + 1+i  El(X2)  * mlPl  ‘ Bl 


= X1  ‘ I1  + 1+1  E1<X2) 

- F<10>  + ei  - h + ITT  E1[F(I1>  + e2i 


8 

The  coefficient  7 can  be  interpreted  as  a persistent  coefficient. 
If  7-I,  the  random  shock  at  period  1 is  permanent;  if  7=0,  the  shock  is 
transitory;  and  if  0<7<1,  the  market  is  assumed  to  partially  adjust  to 
the  random  shock. 


43 


F(V  + €i  ' Ii  + i+i  VF(V  + 7V- 


(3.10) 


On  the  other  hand,  the  ex-dividend  value  of  the  share  can  be  expressed  as 
the  expected  value  of  (3.10): 


VV  - E0[  F(V  + V - VV  + in  V F(I1)  + e2 


= F(Io>  * xi + m F<V 


(3.11) 


Subtracting  E^(V^)  from  , we  have  the  following  expression,  which  would 
be  interpreted  as  the  dividend  payout  effect  on  the  firm  value: 


V VV  ■ e->(1  + 


1+i 


= [Xx  - E0(X1)](1  +TII) 


(3.12) 


Under  the  assumption  of  rational  expectation,  the  expected  and  actual 
investment  are  at  an  optimum  level,  i.e. 


X1  = VV  = V 


where  I represents  optimum  investment.  Then,  at  optimum  investment,  the 
difference  between  the  actual  net  dividend  and  expected  net  dividend- 
unexpected  dividend  change -can  be  expressed  with  term  X if  we  use  the 
identity  (3.7) : 


( Dx  - m1P1  - Bx)  - Eq(  - m1P1  - B^  = X±  - EQ(  X±) . (3.13) 


From  the  equations  (3.12)  and  (3.13),  we  know  that  the  unexpected  change 


44 


in  net  dividend  conveys  the  same  information  as  the  unexpected  change  in 
earnings , thus  we  can  conclude  that  the  unexpected  change  in  dividends 
will  affect  the  firm's  value. 

The  Agency  Cost  Hypothesis  in  Dividend  Theory 

Another  recent  approach  to  the  dividend  puzzle  is  an  application 
of  principal-agent  theory.  The  proponents  of  the  agency  cost  hypothesis 
argue  that  the  signaling  approach  is  not  a good  device  to  explain  the 
dividend  puzzle.  They  reason  that  it  is  unclear  what  dividends  signal, 
how  signaling  is  done,  or  why  dividends  are  a better  signal  than  other 
cheaper  methods . Furthermore , dividends  do  not  directly  reveal  the 
prospects  of  the  firms,  so  the  message  they  convey  may  be  ambiguous  (see 
Easterbrook,  1984,  for  details). 

The  principal-agent  relationship  can  be  defined  as  a contract 
under  which  one  or  more  persons  (the  principals)  engage  another  person 
(the  agent)  to  perform  some  service  on  their  behalf.  That  involves 
delegating  some  decision  making  authority  to  the  agent.  The  agency 
problem  comes  from  the  possibility  of  moral  hazard.  Because  both 
parties  of  the  principal-agent  relationship  are  utility  maximizers,  the 
agent  will  try  to  maximize  his  own  utility  rather  than  the  principals' 
utility.  Agency  cost  is  the  expenditure  to  prevent  the  agent  from 
acting  against  the  principals'  interest  and  to  induce  an  agent  to  behave 
as  if  he  were  maximizing  the  principals'  welfare.  Following  Jensen  and 
Meckling  (1976),  agency  cost  is  the  sum  of  the  monitoring  expenditure  by 
the  principal,  the  bonding  expenditure  by  the  agent,  and  the  residual 
loss  from  slippage. 


45 


Agency  cost  theory  can  be  applied  to  dividends  since  dividend 
payout  may  well  serve  as  a means  of  monitoring  or  bonding  management 
performance.  Greater  dividend  payout  implies  costly  external  financing, 
and  the  very  fact  that  the  firm  must  go  to  the  financial  markets  implies 
that  the  firm  will  be  monitored  by  loan,  financing,  and  underwriting 
agencies.  Therefore,  the  agency  cost  model  of  dividends  examines  the 
choice  of  the  appropriate  probability  for  a firm  to  visit  the  financial 
markets.  The  target  value  of  the  probability  is  determined  by  the 
trade-off  between  the  benefit  of  monitoring  and  the  cost  of  inducing  the 
monitoring. 

The  sources  and  uses  of  funds  can  be  rewritten  as  follows: 


B^,  D are  the  same  as  in  (3.5)  and  (3.7).  Let  us  assume  that  the 
firm's  expectation  of  its  profit  for  period  t is  of  the  form 


Then  the  probability  of  visiting  the  financial  markets  can  be  written  as 


(3.14) 


where  S represents  new  share  issue  during  period  t,  and  variables  X , 
r t 


2 

where  e.  ~ iid  ( 0,  a ). 
l 


Pr(  S + B > 0 ) = Pr(  X®  - I - D > 0 ) 


- pr(  xt  + «t  ■ tt  ■ Dt  > ° ) 

- Pr<  et  < - Xt  + b + V- 


(3.15) 


The  agency  cost  hypothesis  suggests  that  dividends  will  be  chosen  to  set 


46 


the  average  probability  equal  to  the  target  probability  to  visit 
financial  markets,  for  which  the  notation  /i  will  be  used.  If  we  let 
f(e)  and  F(e)  be  the  density  and  distribution  function  of  e 
respectively,  the  probability  (3.15)  can  be  expressed  as  follows: 


The  target  probability  n is  considered  as  a function  of  the  costs  and 
benefits  of  monitoring.  These  are  characterized  by  the  vector  Z.  The 
equation  can  then  be  rewritten  using  the  distribution  function: 


If  we  assume  that  F(.)  is  an  invertible  function,  then  we  can  obtain  the 
solution  for  the  optimum  dividend  level  from  the  equation  (3.17): 


This  solution  suggests  that  the  optimum  dividend  level  may  be 
determined  by  the  net  cash  flow  and  the  other  variables  characterizing 
monitoring  benefits  and  costs. 

To  complete  modeling  the  agency  cost  hypothesis,  it  is  necessary 
to  select  the  variables  which  characterize  agency  costs.  Unfortunately 
it  is  somewhat  arbitrary  to  select  the  relevant  variables.  The  first 
empirical  study  which  includes  variables  representing  agency  costs  is 
the  study  of  Rozeff  (1982).  He  uses  two  variables  to  measure  agency 


-oo 


(3.16) 


F ( - xt  + xt  + V = "(zt>- 


(3.17) 


ecvi  +xt  • V 


(3.18) 


costs;  the  percentage  of  stock  held  by  insiders  and  the  number  of  common 


47 


stock  holders.  His  model  and  the  results  are  discussed  in  the  next 
section. 


Empirical  Studies  on  Dividend  Behavior 
An  underlying  theoretical  device  in  almost  all  dividend  models 
using  regression  analysis  is  a model  proposed  by  Lintner  (1956).  The 
model  is  a naive  one  as  follows: 


D . = a . + b Y . +dD.  + u , 

it  1 it  i, t-1  t’ 

where  D = dividend  payout  at  period  t, 

= retained  earnings  at  period  t. 


(3.19) 


The  rationale  for  the  model  is  that  dividend  depends  on  current  net 
earning  but  also  are  influenced  by  past  dividends  because  of  the 
reluctance  to  cut  or  to  raise  the  dividends  to  levels  which  may  not  be 
maintainable.  A more  detailed  theoretical  rationale  for  the  Lintner  model 
is  a partial  adjustment  model.  The  Lintner  model  (3.19)  is  derived  from 
the  equation: 


where 


D. 

it 


it 


a. 

l 


it 


Vt.i 


ai  + V Dit  - + "if 


dividend  payout  at  period  t, 
speed  of  adjustment, 
unobserved  target  value , 
constant  term, 

2 

stochastic  error  term  assumed  IN(0,a  ). 


(3.20) 


48 


Lintner  replaces  the  target  dividend  by  where  y.  is  the  target 

payout  ratio  and  is  the  current  net  earning.  Then  the  equation  can 
be  modified  as 


Dit  - ai  + ViYit  + ( 1 ■ C1>  Di,t-1  + uit 

- ai  + bYlt  + dDl,t-l  + “if 


where  b = c.y.  and  d = 1 - c,. 

ii  i 


(3.21) 


Lintner  finds  that  the  model  explains  85%  of  the  changes  in  dividends, 
the  average  value  of  c is  0.3,  and  7 is  approximately  0.5  (see  Lintner, 
1956,  p.  108-109). 

Fama  and  Babiak  (1968)  use  simulations  and  a prediction  power  test 
to  investigate  several  different  model  specifications.  Their  suggested 
models  can  be  classified  by  whether  the  constant  term  is  included  or 
not,  and  whether  lagged  earning  and/or  depreciation  is  included  or  not.9 
The  results  of  the  regressions,  simulations,  and  prediction  tests,  using 
a sample  of  201  firms  with  17  years  of  data  (1947-1964),  show  that  the 
Lintner  model  including  a constant  term  performs  well  relative  to  other 
models;  A model  suppressing  the  constant  term  and  with  a lagged  earnings 
term  shows  slightly  better  predictive  power;  and  inclusion  of 
depreciation  as  a separate  explanatory  variable  does  not  provide  good 
results . 

Nakamura  and  Nakamura  (1985)  compare  the  Lintner  model  to  a model 
that  adds  a lagged  earnings  variable  as  an  explanatory  variable.  They 


Lintner  (1956)  argues  that  a constant  term,  which  is  expected  to 
be  positive,  should  be  included  in  a model  to  reflect  the  greater 
reluctance  to  reduce  than  to  raise  dividends. 


49 


call  it  a rational  expectations  model  since  they  assume  rational 
expectations  to  derive  the  model.  In  their  rational  expectations  model 
of  dividend  behavior,  the  target  dividend  payout  is  assumed  to  be  a 
fraction  of  the  permanent  earnings  of  a firm  rather  than  that  of  the 
current  earnings  of  a firm.  The  reason  that  permanent  earnings  rather 
than  current  earnings  are  used  is  because  the  change  in  current  earnings 
flow  may  be  viewed  by  the  management  as  essentially  transitory,  and  so 
would  not  be  likely  to  cause  a noticeable  change  in  dividends.  Nakamura 
and  Nakamura  formulate  the  permanent  earnings  as  the  expected  earnings 
in  the  future  under  rational  expectations. 


7-y^ 
i it 


where  Y?  = 6 2 b^E  Y . , 
rt  j=0  t t+j 


(3.22) 


where  6 is  the  real  rate  of  return,  assumed  to  be  constant,  and 
b—  is  the  discount  rate.  Some  mathematical  manipulation  leads  to 
the  model.10 


D.  = 
it 


aA  + a^Y.^  + a.Y.  1 + a„D.  . + e . 
0 1 it  2 i,t-l  3 i,t-l  t 


(3.23) 


They  use  OLS  to  run  the  pooled  regression.  The  results  show  that  the 
rational  expectations  model  is  more  appropriate  to  explain  firms' 
dividend  behavior  since  it  produces  a greater  and  has  better 
predictive  power  than  the  Lintner  model  (see  Nakamura  and  Nakamura, 
1985,  Table  1 and  Table  2). 


10  For  the  derivation  procedure  and  the  formulations  for 
coefficients  and  error  term,  refer  to  Nakamura  and  Nakamura  (1985). 


50 


Marsh  and  Merton  (1987)  use  stock  prices  instead  of  accounting 
earnings  to  measure  permanent  earnings.11  They  derive  the  relationship 
between  the  permanent  earnings  and  the  cum-dividend  price  of  a share. 

The  permanent  earnings  variable  is  defined  as  the  fraction  of  the 
discounted  value  of  the  expected  cash  flow  available  for  distribution  to 
each  share  oustanding.  Since  the  cum-dividend  price  of  a share  can  be 
splitted  into  two  components  (the  ex-dividend  price  and  the  dividend), 
the  model  which  contains  the  dividend  variable  and  the  stock  price 
variable  instead  of  earnings  variable  can  be  obtained.  Their  stock- 
price  model  takes  the  form  of  an  error  correction  model: 


r D 


log 


t+1 


L t 


t+1 


- V a] 


r V Dt  i 

r Dt  i 

[ C ] 

+ a2log 

L pt+i  J 

+ u 


t+1 


The  results  of  the  estimation  by  OLS  and  GLS , using  the  Center  for 
Research  in  Security  Prices  (CRSP)  data  set  over  the  period  1926-1981, 
show  that  the  model  explains  about  50%  of  the  aggregate  real  dividend 
changes  and  the  estimated  coefficient  on  the  lagged  percentage  price 
change  (a^)  turns  out  to  be  highly  significant.12  Marsh  and  Merton 


11  Marsh  and  Merton  do  not  provide  the  rationale  for  using  the 
stock  price  variable  rather  than  the  earnings  variable  to  assess 
permanent  earnings.  However,  we  can  find  the  rationale  in  the  study  by 
Friend  and  Puckett  (1964)  where  they  point  out  that  there  is  almost  no 
measurement  error  in  the  dividend  variable,  but  there  is  considerable 
measurement  error  in  the  earnings  variable.  Accounting  earnings  are 
controlable  by  management.  Therefore  accounting  measures  of  earnings 
often  imprecisely  reflect  the  real  economic  earnings  of  the  firm.  The 
measurement  error  in  the  earnings  variable  will  cause  its  coefficient  to 
be  biased  downward. 

12 

Results  from  the  estimation  of  the  Lintner  model  using  the 
earnings  variable  report  about  40%~45%  of  explaining  power.  Refer  to 
table  2 in  Fama  and  Babiak  (1968). 


51 


compare  the  performance  of  the  stock-price  model  with  that  of  the  model 
using  accounting  earnings  and  show  that  the  two  models  perform 
similarly.  However,  the  stock-price  model  has  a major  advantage  over 
the  conventional  model  using  the  earnings  variable.  Since  the  stock- 
price  model  uses  only  lagged  prices,  it  can  be  used  to  forcast  future 
dividend  changes , whereas  the  conventional  model  cannot  be  used  to 
forecast  future  dividend  changes.13 

The  next  empirical  question  we  wii  review  is  whether  dividends 
convey  information  about  the  future  earnings  or  whether  dividend 
announcements  affect  share  values.  Watts  (1973)  uses  a following 
dividend  model,  suggested  by  Fama  and  Babiak,  to  test  the  hypothesis 
that  dividends  convey  information  about  the  future  earnings  stream. 


AD.  = B .D.  . + Y.  + B..Y.  . + z,  . 

it  li  i,t-l  2i  it  3i  i,t-l  it 


(3.24) 


The  parameters  of  the  equation  (3.24)  are  estimated  for  each  of  the  310 
firms  using  OLS . Then,  the  change  of  earnings  for  the  period  t+1  is 
regressed  on  the  calculated  residuals. 


AY 


i , t+1 


7.  + 6 .z . . + w.  . 
i l it  i,t+l 


(3.25) 


The  estimated  residual  z ^ can  be  interpreted  as  the  unexpected  changes 
in  dividend.  Therefore,  we  test  the  information  hypothesis  using 
equation  (3.25).  The  OLS  estimation  produces  the  result  that  the 
relationship  between  the  unexpected  change  in  dividends  and  the  change 


13 

In  the  accounting- earnings  model,  the  variabe  Y(t+1)/Y(t)  is 
used  in  the  place  of  P(t)/P(t-1)  of  the  stock-price  model. 


52 


in  future  earnings  is  positive  on  the  average.  However,  the 
relationship  is  hardly  strong  (see  Watts,  1973,  p.  201). 

Watts  then  uses  the  estimated  unexpected  changes  in  dividends  to 
test  the  effect  of  dividend  change  on  share  prices.  The  abnormal 
performance  index  (API)  is  obtained  from  the  capital  market  model: 


R..  = a + IS  .R  . + e 


it 


l mt 


it’ 


(3.26) 


where  R.  = the  total  return  on  the  common  share  of  the  firm  i, 
it 

a = a constant  term, 

13^  = systematic  risk  parameter, 

R = the  market  rate  of  return 
mt 


= the  abnormal  performance  of  the  security. 


The  API  for  a security  is  the  product  of  its  one  month  abnormal  returns . 
This  is  obtained  as  follows : 


T 

API  - n ( 1 + e ) T - 1,2,..  .,N.  (3.27) 

t-1 


Under  the  information  hypothesis,  one  expects  that  the  positive  abnormal 
price  changes  will  accompany  positive  unexpected  dividend  changes. 
Likewise, the  negative  abnormal  changes  will  accompany  negative  dividend 
changes.  Hence,  the  APIs  for  positive  unexpected  dividend  changes 
should  be  greater  than  the  APIs  for  the  overall  sample.  Similarly,  the 
APIs  for  negative  unexpected  dividend  changes  should  be  less  than  the 
APIs  for  the  overall  sample.  However,  Watts  concludes  from  his  results 
that  there  is  very  little  relationship  between  the  dividend  residual  of 


53 


the  year  and  the  monthly  average  stock  return  (refer  to  Table  7,  in 
Watts,  1973,  p.  206). 

Pettit  (1972)  disputes  this  conclusion.  He  uses  both  monthly  and 
daily  data  to  investigate  the  API  of  firms  and  renews  the  conclusion 
that  substantial  information  is  conveyed  by  the  announcement  of  dividend 
changes.  Pettit  argues  that  the  classification  scheme  used  by  Watts, 
which  is  based  on  the  sign  of  the  estimated  unexpected  change  in 
dividends,  is  not  appropriate  since  the  dividend  change  variable  is  not 
continuous  but  discrete.14  Moreover,  using  the  annual  payment 
understates  the  magnitude  of  the  change  unless  the  firm  makes  the  change 
in  the  first  quarter  since  dividends  are  typically  set  at  a quarterly 
rate  rather  than  annual  rate . 15 

A study  by  Aharony  and  Swary  (1980)  separates  the  information 
content  of  unexpected  quarterly  dividend  changes  from  that  of  quarterly 
earnings  reports  to  investigate  the  ICD  hypothesis.  They  examine  only 
those  quarterly  dividend  and  earnings  announcements  made  public  on 
different  dates  within  any  given  quarter  to  ascertain  whether  quarterly 
dividend  changes  provide  information  beyond  that  already  provided  by 
quarterly  earnings  announcements.  They  compare  the  cumulative  effects 
of  the  abnormal  returns  behavior  in  the  days  surrounding  the  dividend 
and  earnings  announcement  dates  to  conclude  that  the  quarterly  dividend 

14  Following  Pettit,  Watts's  classification  scheme  leads  to  upward 
bias  of  the  API  for  the  negative  dividend  information  group  and  to 
downward  bias  of  the  API  for  the  positive  dividend  information  group. 

15  Watts  (1976)  argues  that  the  conflicting  results  between  Watts 
(1973)  and  Pettit  (1976)  are  caused  by  the  misspecif ication  of  the 
earnings  variable  used  by  Pettit,  and  that  the  conclusion  of  Pettit's 
study  is  not  reliable. 


54 


payments  provide  useful  information  beyond  what  is  provided  by  the 
quarterly  earnings  announcements . 

In  sum,  the  ICD  hypothesis  is  supported  in  many  empirical  studies. 
In  other  word,  unexpected  dividend  changes  likely  convey  information 
about  the  future  earnings  to  the  market. 

Another  empirical  issue  that  may  be  related  to  the  ICD  hypothesis 
is  the  Miller  and  Modigliani  dividend  irrelevancy  proposition.  One 
simple  test  is  to  look  directly  at  the  relationship  between  the  dividend 
payment  and  the  share  price.  The  usual  regression  model  relates  stock 
price  to  current  dividends  and  retained  earnings. 


P.  = a + D.  + B0Y.  + 
it  1 it  2 it 


it’ 


(3.28) 


where  P = the  price  per  share, 

= dividend  payout, 

= retained  earnings, 

£ . = error  term, 

it 

However,  the  approach  has  major  problems.  One  of  the  problems  is  the 
ommission  of  a risk  variable.  The  ommision  of  a risk  variable  may  cause 
an  upward  bias  in  the  estimate  of  the  dividend  coeficient.  A second  is 
the  measurement  error  in  the  retained  earnings  variable.  This  causes 
the  downward  bias  in  the  estimated  coefficient  (see  Friend  and  Puckett, 
1964) . 

A more  sophisticated  study  by  Black  and  Scholes  (1974)  combines 
the  dividend  model  and  the  capital  asset  pricing  model  (CAPM) . The 
original  CAPM  says  that  the  expected  return  on  any  security  should  be  a 


55 


linear  function  of  its  systematic  risk: 


EO^)  = R + [ E(Rm)  - R ]fii 


(3.29) 


where  E(R^)  - the  expected  return  on  security  i, 

E(R^)  = the  expected  return  on  the  market  prtfolio, 
R = the  riskless  short  term  interest  rate, 


I).  = the  systematic  risk  which  can  be  expressed  as 


1 


Cov(R  , R ) / Var(R  ) . 
l m m 


The  equation  (3.29)  is  modified  by  Black  and  Scholes  as  follows: 


(3.30) 


where  r^  is  significantly  greater  than  R.  If  we  suppose  that  the 
dividend  yield  is  related  to  the  expected  return  on  stocks  and  that  the 
relationship  is  linear,  we  can  have 


significantly  different  from  zero,  we  would  reject  the  hypothesis  that 
dividend  policy  does  not  matter.  However,  Black  an  Scholes'  estimation 
suggests  that  r^  is  not  significantly  different  from  zero,  thus,  the 
share  price  is  not  affected  by  the  dividend  policy  of  the  firm. 

A study  by  Fama  (1974)  examines  the  validity  of  the  M-M 
proposition  by  investigating  the  interrelationship  between  the  dividend 


(3.31) 


where  6^  = the  dividend  yield  on  stock  i, 

S = the  dividend  yield  on  the  market, 
m J 


If  the  dividend  yield  coefficient,  r^  in  (3.31),  turns  out  to  be 


56 


decision  and  the  investment  decision  of  the  firm.  The  M-M  proposition 
says  that  the  investment  decision  should  never  be  affected  by  the 
dividend  decision  but  it  does  not  rule  out  the  possibility  of  opposite 
causality.  Fama  uses  the  simultaneous  equations  model  with  a structural 
equations  for  dividend  and  investment.  He  examines  the  prediction 
errors  for  several  dividend  models  and  the  investment  models  to  select 
the  best  structural  equations  for  dividend  and  investment  models.  The 
Lintner  model  and  the  two-variable  output  model  and  its  differenced  form 
show  better  performance  than  any  other  model.  Fama  then  uses  the  two 
stage  least  squares  (2SLS)  method  to  estimate  the  simultaneous 
equations , and  investigates  the  presence  of  any  systematic  relationship 
between  parameter  estimates  or  residuals  of  the  dividend  model  and  the 
investment  model.  The  hypothesis  of  no  period-by-period  association 
between  the  dividend  decision  and  investment  decision  cannot  be  rejected 
in  his  study.  The  result  is  consistent  with  the  M-M  dividend 
irrelevancy  proposition. 

Another  topic,  which  must  be  investigated,  is  the  agency  cost 
hypothesis.  Following  the  agency  cost  hypothesis,  the  optimum  dividend 
level  may  be  determined  by  net  cash  flow  and  other  variables  that 
characterize  agency  benefits  and  costs. 

Rozeff  (1982)  attempts  to  add  variables  which  characterize  agency 
costs  in  explaining  dividend  behavior.  He  suggests  three  factors  which 
may  influence  the  dividend  payout  ratio:  investment,  new  long-term  debt, 
and  agency  cost.  The  variables  included  in  the  regression  model  are  the 
average  growth  rate  of  revenue,  the  forecast  of  the  average  growth  rate 
of  revenue,  the  beta  coefficient,  the  number  of  common  stockholders,  and 


57 


the  percentage  of  common  stock  held  by  insiders.  The  two  variables 
growth  rate  variables  are  interpreted  as  proxy  variables  for  investment. 
The  beta  coefficient  is  the  risk  parameter  and  is  related  to  long-term 
debt.  The  two  variables  related  to  the  common  stock  are  used  to  measure 
agency  costs.  The  rationale  for  the  agency  cost  variables  is  that  as 
outside  stock  holders  own  a large  share  of  the  common  stock,  they  will 
demand  higher  dividend  as  part  of  the  optimum  monitoring  package.  And, 
as  the  number  of  common  stockholders  becomes  smaller,  their  ownership 
will  be  more  concentrated,  thereby  reducing  agency  cost  and  lowering  the 
dividend  payout. 

The  results  of  the  estimation  of  the  regression  using  a sample  from  the 
Value  Line  Investment  Survey  of  1981,  which  consists  of  1,000  non- 
financial  and  non-regulated  firms,  show  that  the  number  of  common 
stockholders  does  have  a significantly  positive  effect  on  the  dividend 
payout.  The  other  variables  have  significantly  negative  effects  on  the 
dividend  payout,  which  is  as  expected.  The  results  are  consistent  with 
the  agency  cost  hypothesis  and  indicate  that  investment  policy 
influences  dividend  policy.  This  contradicts  the  M-M  dividend 
irrelevancy  proposition. 

So  far  all  the  econometric  models  reviewed  used  OLS  or  some 
closely  related  varient.  Another  specification  is  a friction  model,  in 
which  the  dependent  variable  responds  to  only  large  value  of  the 
exogenous  variables.  Maddala  (1983)  points  out  that  the  friction  model 
is  more  appropriate  to  describe  dividend  behavior  than  the  traditional 
partial  adjustment  model  since  the  partial  adjustment  model  does  not 
capture  the  fact  that  dividends  change  in  discrete  jumps.  The  friction 


58 


model  of  dividend  behavior  can  be  written  as 


yit  - 

0 

if 

yit 

< 

L0  ’ 

H* 

ri- 

ll 

Lo 

if 

Lo 

< 

* 

yit 

< 

yit  = 

L1 

if 

L1 

< 

* 

yit 

< 

L2 

Let  the  latent  variable  y.  be 

J it 

* ' 2 
y.  = IJ  X.  + u.  , where  u.  — IN  ( 0,  a ), 

•’it  it  it’  it  \ 


then  the  likelihood  function  is  obtained  as  follows: 


$ 


L(fl,a|L  L ,L  X)  = II 

0 


Ot 


$ 


2t 


$ - $ 

It  Ot 


$ 


$ 


2t 


n 

L, 


$ 

2t  It 


$ 


2t 


where  ^Qt>  are  corresPon<^^-nS  distribution  functions  at  the 
points  of  ylt-0  , y.^,  and  y.^  . 

An  example  of  the  application  of  this  model  is  the  cross-sectional 
study  for  dividend  behavior  by  Kao,  Lee  and  Wu  (1988).  They  use  a model 
suggested  by  Rosett  (1959)  to  investigate  dividend  stickiness: 


AD 

* 

= AD  - CL 

if 

* 

AD  < (L  , 

t 

t 1 

t 1’ 

= 0 

if 

C!  * K * C2’ 

ic 

* 

= AD  - C0 

if 

C„  < AD  , 

t 2 

2 t’ 

■j k 

where  AD 

t 

is  the  true 

change  in 

dividend,  while 

(observed) 

change  in 

dividends . 

The  threshold 

than  1,  represents  a desired  decrease  in  dividends,  while  , which  is 


59 


greater  than  zero,  represents  a desired  increase  in  dividends.  The 
estimation  results  with  the  assumption  of  C^=0  do  not  strongly  support 
the  presence  of  the  friction.  However,  the  fourth-order  autoregressive 
process  of  the  error  term,  which  is  significantly  present  in  nonlinear 
regression,  disappears  when  the  friction  model  is  used.  Therefore,  they 
conclude  that  the  fourth-order  serial  correlation  is  related  to  the 
discrete  jumps  in  dividend  payout  and  interpret  the  result  as  confirming 
the  presence  of  dividend  stickiness. 


CHAPTER  4 

MODELING  AND  ESTIMATION  OF  DIVIDEND  BEHAVIOR 
OF  THE  U.S.  MANUFACTURING  INDUSTRY 

Introduction  and  Data 

In  the  previous  chapter,  we  have  reviewed  theoretical  and 
empirical  studies  of  dividend  behavior.  Following  the  Miller-Modigliani 
dividend  irrelevancy  proposition,  dividend  should  not  be  paid  because  of 
its  cost.  Recent  attempts  to  overturn  the  M-M  proposition  and  to 
explain  the  reason  for  paying  dividend  in  spite  of  the  cost  are  the 
applications  of  information  theory. 

The  signaling  hypothesis  is  that  the  dividend  payout  conveys 
information  about  the  firm's  true  value  and  its  future.  The  agency  cost 
hypothesis  says  that  the  optimum  level  of  dividend  payout  is  influenced 
by  variables  characterizing  monitoring  benefits  and  costs.  Therefore, 
the  validity  of  these  hypotheses  is  an  empirical  question. 

In  this  chapter,  a dividend  behavior  model  which  follows  the 
framework  of  the  Lintner  model  is  suggested.  The  weaknesses  of  the 
empirical  studies  on  dividend  behavior  so  far  can  be  summarized  into  two 
problems.  The  first  one  concerns  the  model  specification  and/or  data. 
Most  empirical  studies  are  cross-sectional  studies.  They  estimate  the 
dividend  behavior  equation  for  each  firm  using  time- series  data  and 
report  the  cross-sectional  results.  Other  than  the  advantages  of  using 
panel  data  presented  in  Chapter  2 , it  would  be  more  appropriate  to  use 
panel  data  if  we  are  concerned  with  the  industrial  dividend  behavior 


60 


61 


over  certain  periods.  The  second  weakness  concerns  the  estimation 
techniques.  Most  of  the  early  studies  use  simple  OLS  for  estimation, 
which  presents  serious  problems  because  we  observe  a lot  of  zero 
observations  for  dividend  payout.  Since  the  dividend  payout  is  censored 
at  zero,  an  OLS  estimator  will  be  biased.  On  the  other  hand,  if  we  use 
only  nonzero  observations  for  estimation,  we  suffer  from  a selectivity 
bias  problem. 

In  this  chapter,  a tobit  model  covering  panel  data  is  applied  to 
specify  the  industry  dividend  behavior  model.  The  model  is  estimated  by 
maximum  likelihood. 

The  data  are  obtained  from  the  latest  issue  of  the  Compustat  tape. 
Because  data  are  not  uniformly  available  on  the  variables  we  need,  only 
12  consecutive  years,  from  1976  to  1987,  are  used  to  analyze  the 
industrial  dividend  behavior  of  the  manufacturing  industry  in  the  United 
States.  We  excluded  firms  which  do  not  have  all  the  data  we  need  for 
the  whole  sample  period.  Then,  the  data  set  is  divided  into  ten  subdata 
sets  based  on  the  industry  code  of  Panel  Study  of  Income  Dynamics 
(PSID) . The  structure  of  the  data  sets  is  reported  in  table  4.1. 

Manufacturing  industry  is  selected  for  study  because  regulated 
firms  such  as  gas,  telephone,  electricity,  utilities  and  financial  firms 
such  as  bank,  insurance,  loan  investment  companies  are  not  appropriate 
for  our  study  of  dividend  behavior.  These  industries  may  find  that 
their  financing  policies,  including  their  dividend  policy,  may  be 
significantly  affected  by  regulations.  In  fact,  the  data  for  those 
firms  show  that  there  are  no  remarkable  fluctuations  in  dividend 
payouts,  whereas  such  fluctuations  are  easily  observed  in  the  data  of 


62 


the  manufacturing  industry.  Moreover,  zero  dividend  payouts,  which 
require  the  application  of  tobit  model,  are  not  frequently  observed. 

Modeling  Dividend  Behavior 

As  discussed  in  Chapter  3,  the  Lintner  model  has  received 
empirical  and  theoretical  support.  The  main  criticism  of  the  existing 
empirical  models  is  not  the  model  itself  but  the  specification  and/or 
estimation  techniques.  Since  Lintner  suggested  the  model  and  used  OLS 
to  verify  it,  OLS  has  been  widely  used  in  the  empirical  work  on  dividend 
behavior.  However,  OLS  estimates  for  dividend  behavior  models  suffer 
serious  problems.  Clearly,  many  firms  do  not  pay  dividends  since  they 
cannot  pay  negative  dividends.  Thus  the  observations  on  dividend  payout 
are  censored  at  zero.  As  we  mentioned  in  Chapter  2,  OLS  estimates  are 
downward  biased  if  we  use  all  the  observations  including  ones  with  zero 
dividend.  Using  only  the  observations  with  positive  dividends  cannot  be 
justified  because  of  the  selectivity  bias  problem.  Therefore,  it  is 
more  appropriate  to  apply  some  form  of  tobit  model,  which  can  be  used  to 
get  consistent  estimates  for  censored  data,  to  the  dividend  behavior 
model . 

Another  possible  criticism  concerns  the  data  which  are  used  in 
empirical  studies.  Usually,  cross-sectional  differences  and  time -period 
differences  are  the  main  interests  of  most  empirical  studies.  However, 
it  would  be  desirable  to  investigate  both  differences  together  if  we 
can.  It  is  not  appropriate  to  explain  the  cross-sectional  differences 
and  time-period  differences  together  using  only  one  of  the  cross-section 
data  or  time -series  data.  Furthermore,  the  pooled  data  should  be  used 


63 


if  we  want  to  explain  the  behavior  of  an  industry  over  time.  Therefore, 
a desirable  model  specification  to  explain  the  dividend  behavior  of  the 
firms  is  the  extension  of  the  tobit  model  to  panel  data. 

The  framework  proposed  in  this  chapter  follows  Lintner's  model. 

The  explanatory  variables  in  the  Lintner  model  are  only  earnings  and  the 
lagged  dividend.  Other  variables,  which  represent  the  signaling 
hypothesis  and  the  agency  cost  hypothesis,  are  added  as  extra 
explanatory  variables  to  test  the  hypotheses  of  these  dividend  theories. 
These  variables  are  the  growth  rate  of  sales  (GROWTH),  debt-equity  ratio 
(DER)  and  logarithm  of  the  number  of  common  share  holders  (CSH) . The 
variables  GROWTH  and  DER  are  used  as  proxy  variables  for  the  variables 
characterizing  the  firm's  future.  The  other  variable,  CSH,  represents 
agency  cost. 

The  structure  of  the  model  is  a tobit  model  with  zero  threshold: 


* 

Dit 


a.+  J1  Y.  + B~D.  _ .+  fi  GROWTH  + B.  DER.  + BcCSH.  + u.  , 
l 1 it  2 i,t-l  3 it  4 it  5 it  it 


°it 


= D. 


it 


if  D.  > 0, 

it 


(4.1) 


= 0 


otherwise . 


Three  different  error  specifications  are  assumed  to  estimate  this 
model.  The  first  one  is  the  fixed  effect  model  in  which  the  stochastic 
error  term  is  maintained  to  be  independently  identically  distributed  for 
all  the  observations.  The  others  are  the  heteroscedastic  adjustments  of 
the  random  effect  model.  As  discussed  in  Chapter  2,  the  conventional 
stochastic  assumptions  for  the  firm  specific  terms  and  error  terms  lead 
to  an  error  covariance  matrix  where  the  off-diagonal  terms  are  not  all 


64 


zeros  and  make  it  difficult  to  estimate.  For  the  convenience  of 
estimation,  all  the  off-diagonal  elements  are  assumed  to  be  zero  and  the 
diagonal  terms  are  adjusted  to  have  heteroscedasticity  to  implement  the 
non- zero  error  covariance  between  the  observations  in  different  time 
period  for  each  firm.  The  diagonal  elements  of  the  error  covariance 
matrix  are  assumed  to  be  either  the  arithmetic  or  geometric  means  of  the 
variance  characterizing  the  cross-sectional  distribution  of  the  errors 
in  each  period  and  the  variance  characterizing  the  time  specific 
distribution  of  the  errors  for  each  firm.  These  three  different  error 
covariance  structures  can  be  written  formally  as  follows: 


Structure 

1 : Cov(u.  , 

v it’ 

u.  ) 
js' 

2 

= a 

for  i-j , t-s , 

(4.2a) 

= 0 

otherwise . 

Structure 

2 : Cov(u.  , 

v it’ 

js' 

2 2 

- <V  V/2 

for  i-j , t-s , 

(4.2b) 

- 0 

otherwise . 

Structure 

3 : Cov(u.  , 

v it’ 

u.  ) 
JS7 

2 2 h 
“ (*?t> 

for  i-j  , t-s , 

- 0 

otherwise . 

(4.2c) 

The  model 

(4.1)  is  estimated 

under  these 

three  assumptions  using 

the 

maximum  likelihood.  The  detailed  procedure  and  the  estimation  results 
are  discussed  in  the  next  section. 

The  heteroscedastic  adjustments  of  the  random  effects 
specification  are  convenient  assumptions  rather  than  ones  which  are 
based  upon  theoretical  derivation  and  must  be  subject  to  testing. 
Several  criteria  are  examined  to  compare  the  error  specifications.  The 
results  of  these  tests  are  also  discussed  in  the  next  section. 


65 


Estimation  and  Specification  Analysis 


The  dividend  behavior  models  with  the  error  covariance  structures 
(4.2a),  (4.2b)  and  (4.2c)  are  estimated  using  U.S.  manufacturing 
industry  data.  For  comparison  and  evaluation,  the  LSDV  estimates  and 
the  GLS  estimates  are  obtained  from  the  regression  using  the 
observations  with  positive  dividend  payouts.  The  estimation  results  and 
the  test  statistics  are  reported  in  the  first  and  the  second  column  of 
the  tables.  The  estimates  for  the  firm  specific  effects  and  the  error 
variance  are  not  reported  in  the  tables . 

2 

The  LM  statistic,  asymptotically  distributed  as  x (1).  is  the 
specification  test  of  the  random  effects  model  versus  the  classical 
regression  model  without  firm  specific  effects.  Large  values  of  LM 
statistic  favor  the  random  effects  model.  The  LM  statistics  from  the 
regressions  show  that  the  data  of  four  industries  favor  the  random 
effects,  while  the  others  reject  it. 

The  tobit  model  for  dividend  behavior  is  re-written  as  follows: 


if  D*  > 0 


it 


(4.3) 


- 0 


otherwise , 


The  model  Tobitl  is  the  fixed  effects  model  in  which  the  error 


covariance  structure  is  (4.2a).  The  firm  specific  effects  are 


66 


constant  for  each  firm,  and  the  stochastic  error  term  e satisfies  the 
i.i.d.  condition.  An  iterative  estimation  method  for  this  type  of  fixed 
effects  tobit  model  has  been  proposed  by  Heckman  and  MaCurdy  (1980). 
However,  this  method  is  not  very  useful  when  we  estimate  models  with 
many  cross-sectional  units.  In  order  to  obtain  the  estimates  for  the 
fixed  effect  a^'s,  a great  deal  of  computation  time  is  needed. 

A 

Moreover,  the  estimated  a^'s  may  not  be  reliable  if  the  number  of  the 
observations  used  for  the  estimation  of  each  fixed  effects  is  not 
large.1  In  the  actual  estimation,  the  fixed  effects  are  omitted  from 
the  estimation  and  a common  intercept  is  added  to  simplify  matters. 

This  will  form  a baseline  for  comparison  with  the  results  from  the 
regressions  under  the  other  error  covariance  specifications. 

The  maximum  likelihood  method  is  used  as  estimation  method.  All 
the  algorithms  for  the  maximum  likelihood  estimation  use  the  following 
general  relationship  to  update  the  parameter  estimates: 

B =B  + s x P x g , (4.4) 

where  P is  some  kxk  positive  definite  matrix,  g is  the  gradient  vector 
of  the  log- likelihood  function  at  B^,  and  s is  the  steplength.  The 
various  algorithms  differ  in  how  the  matrix  P is  computed. 

For  the  estimation  of  the  model  Tobitl,  the  Newton- Raphson  method 
is  used,  where  the  matrix  P is  the  inverse  of  the  Hessian  matrix  of  the 
log- likelihood  function.  The  Newton- Raphson  algorithm  has  good 


In  this  study,  the  numbers  of  fixed  effects  to  be  estimated  are 
from  37  to  107.  Each  of  them  would  be  estimated  using  only  11 
observations . 


67 


convergence  results.  The  estimation  results  are  reported  in  the  third 
column  of  the  tables.  The  estimation  results  from  the  concentrated 
maximum  likelihood  method  are  somewhat  different  with  respect  to  the 
reported  results  in  the  magnitude  of  the  coefficient  estimates. 

However,  the  significance  of  the  coefficient  estimates  is  not  much 
different.  Therefore,  only  the  results  from  the  maximum  likelihood 
method  are  reported. 

The  other  model  specifications,  Tobit2  and  Tobit3,  use 
heteroscedastic  adjustments  of  the  random  effects  model.  The  firm 
specific  term  is  a stochastic  random  variable  under  the  fixed  effects 
specification,  so  it  can  be  treated  as  a part  of  the  disturbance.  The 
conventional  stochastic  assumptions  on  the  firm  specific  term  and  the 
error  term  result  in  an  untractable  error  covariance  matrix  with 
nonzero  off-diagonal  elements.  The  model  Tobit2  assumes  the  alternative 
error  covariance  structure  suggested  in  (4.2b),  and  the  error  covariance 
of  the  model  Tobit3  follows  the  structure  suggested  in  (4.2c). 

The  problem  with  estimation  of  these  two  models  is  that  there  are 
too  many  parameters  to  be  estimated.2  Most  often,  the  likelihood 
function  with  lots  of  parameters  is  not  well  behaved  because  the 
likelihood  function,  as  well  as  the  gradient  vector,  is  very  sensitive 
to  the  values  of  the  disturbance  parameters.  Therefore,  convergence  is 
not  guaranteed  when  there  are  too  many  parameters  to  be  estimated. 


2 

In  this  study,  the  number  of  parameters  to  be  estimated  is  k+N+T, 
where  k is  the  number  of  coefficients,  N is  the  number  of  firms,  and  T 
is  the  number  of  periods.  In  each  iteration,  the  inverse  of  the  Hessian 
matrix  with  dimension  (k+N+T) x (k+N+T)  should  be  calculated  if  we  use  the 
maximum  likelihood  method. 


68 


The  concentrated  maximum  likelihood  method  is  used  to  solve  this 

2 2 

problem.  The  disturbance  parameters  o^'s  and  0^'s  are  estimated  using 
the  formula  (2.22a)  and  (2.22b)  respectively,  then  substituted  into  the 
log- likelihood  function  to  form  the  concentrated  log- likelihood 
function.  The  maximization  of  the  concentrated  log- likelihood  function 
is  relatively  easy  because  we  need  to  maximize  of  the  log- likelihood 
function  with  respect  only  to  the  structural  parameter  fi. 

The  estimation  procedure  can  be  summarized  as  follows:  OLS 

estimates,  using  the  observations  with  positive  dividend  payout,  are 

A 

used  as  the  initial  values  for  B’  The  estimates  of  the  disturbance 
a 2 *2 

parameters,  a^and  6^_  are  obtained  using  the  formula  (2.22a)  and  (2.22b), 
and  then  substituted  into  the  concentrated  likelihood  function.  New 
parameter  values  are  obtained  from  the  gradient  vector  and  gradient 
matrix  of  the  concentrated  log- likelihood  function  using  the  BHHH 

A 

algorithm.  This  procedure  is  iterated  until  the  estimated  values  for  B, 

A 2 A 2 

a^and  ^converge. 

The  BHHH  algorithm,  developed  by  Berndt,  et.  al  (1974),  is  used  to 
maximize  the  concentrated  log- likelihood  functions  of  the  models  Tobit2 
and  Tobit3.  The  BHHH  algorithm  is  a modified  method  of  scoring 
algorithm  which  uses  the  moment  z'z  in  the  place  of  the  information 
matrix,  where  z is  the  matrix  of  partials  (i.e.  gradient  matrix)  of  the 
log- likelihood  function  with  respect  to  the  parameters.  The  advantage 
of  using  the  BHHH  algorithm  over  the  Newton-Raphson  algorithm  is  that 
the  positive  negativeness  of  the  estimated  Hessian  matrix  is  always 
guaranteed  by  the  BHHH  algorithm  because  Hessian  matrix  is  obtained  by 
accumulating  the  moments  of  each  observation. 


69 


The  log-likelihood  function  for  the  model  Tobit2  is  (2.18)  and 
that  of  the  model  Tobit3  is  (2.19).  The  formula  for  the  gradient  vector 
used  in  the  iteration  procedure  is  the  first-order  condition  of  the 
concentrated  log- likelihood  function  expressed  in  equations  (2.24). 

Each  term  of  the  formula  (2.24)  can  be  obtained  from  the  first-order 
conditions  of  the  specific  log- likelihood  function;  equations 
(2 . 18a)~(2 . 18c)  for  the  model  Tobit2,  and  equations  (2 . 19a)-(2 . 19c)  for 
the  model  Tobit3.  The  moment  matrix  of  each  observation  is  accumulated 
to  guarantee  the  positive  definiteness  of  the  matrix  z'z.  The 
steplength  is  set  equal  to  one  at  the  start  of  iterations,  and  then  is 
adjusted  with  small  values  when  both  the  log- likelihood  value  and  the 
signs  of  gradient  vector  are  oscillating.  Thus  convergence  is  always 
achieved  even  though  it  needs  relatively  more  iterations . The 
estimation  results  are  reported  in  columns  3 and  4 of  the  tables.3 

The  resulting  parameter  estimates  cannot  be  said  to  be  consistent 

because  the  number  of  time-series  observations  for  each  firm  is  fixed 

with  T— 11.  The  inconsitency  problem  is  similar  to  the  fixed  effects 

o 

problem  of  Heckman  and  MaCurdy's  study.  The  estimates  for  a ^ are 
generally  inconsistent  since  they  are  estimated  from  only  12 
observations,  and  this  inconsistency  carries  through  the  other  parameter 
estimates.  However,  following  Heckman  and  MaCurdy,  the  fixed  effects 
problem  is  not  severe  enough  to  affect  the  other  structural  parameter 
estimates  seriously.  In  this  study,  moreover,  estimates  for 
disturbuance  parameters  are  obtained  approximately  to  facilitate  the 

The  estimates  of  the  disturbance  parameters  are  not  reported, 
because  they  are  not  of  interest  in  this  study. 


70 


estimation  problem.  That  is,  the  inconsistency  of  the  parameter 

estimates  is  the  cost  to  solve  the  estimation  problem.  Therefore,  the 

incidental  parameters  problem  is  not  intensively  examined  in  this  study. 

Another  problem  with  the  estimation  is  the  identification  problem. 

If  the  dividend  payouts  of  all  the  observations  for  a firm  are  zero,  the 

2 

estimate  of  a is  zero,  and  is  not  identified.  The  problem  is  avoided 

2 

by  setting  the  values  of  any  such  unidentified  a ^ to  the  average  value 

2 

of  the  identified  a..4 

l 

From  the  estimation  results  of  the  models  Tobitl,  Tobit2  and 
Tobit3,  we  can  ascertain  differences  in  the  significance  for  the 
coefficient  estimates.  The  coefficients  for  the  lagged  dividend 
variable  and  the  earnings  variable  are  significant  in  all  cases. 

However,  the  significance  of  the  coefficients  for  the  variables  GROWTH, 
DER  and  CSH  are  different  according  to  the  model  specifications.  This 
is  summarized  in  table  4.12.  This  difference  is  important  because  these 
coefficients  can  be  used  to  test  the  validity  of  the  various 
informational  hypotheses,  specifically  the  signaling  hypothesis  and  the 
agency  cost  hypothesis.  Moreover,  they  also  can  be  used  to  evaluate  the 
Miller-Modigliani  dividend  irrelevancy  proposition.  Since  the 
estimation  results  are  different  according  to  the  model  specification, 
we  have  to  examine  first  which  of  the  three  specifications  is  the  most 
appropriate . 

For  the  evaluation  of  the  different  specifications,  two  test 

Heckman  and  MaCurdy  (1980)  discard  the  cross-section  units  where 
the  fixed  effect  is  not  identified  to  evade  the  identification  problem. 
However,  Heckman  and  MaCurdy 's  solution  suffers  a selectivity  bias 
problem. 


71 


statistics  (2.28)  and  (2.29)  discussed  in  Chapter  2 are  obtained  from 

2 

the  results  of  the  regression.  The  x -statistics  reported  in  the  tables 
is  the  test  to  examine  the  validity  of  the  assumption  that  all  the  off- 
diagonal  elements  of  the  error  covariance  matrix  are  zero.  Since  all 
the  specifications  suggested  assume  zero  error  covariances,  this  test 
can  be  a criterion  for  evaluating  the  error  covariance  specifications. 
The  results  from  the  test  of  the  hypothesis  that  cov(u.t ,u.^)=0  for  t^s 
are  as  follows:  With  the  specification  of  models  Tobitl  or  Tobit  2,  the 

tests  for  7 industries  reject  the  null  hypothesis,  while  the  null 
hypothesis  is  rejected  in  only  one  industry  when  we  use  the 
specification  of  Tobit3.  This  test  results  suggest  that  the  error 
covariance  specification  used  in  the  model  Tobit3  is  most  appropriate. 

The  other  test,  which  is  applied  to  examine  the  suggested 

specification,  is  the  parameter  stability  test.  The  test  compares  the 

maximized  value  of  the  log-likelihood  functions  from  two  data  sets:  one 

is  a subset  excluding  part  of  the  sample  which  is  reserved  for  the 

prediction,  and  the  other  is  the  entire  data  set.  The  parameter 

stability  test  is  an  analogue  of  the  Chow- test  in  the  sense  that  the 

model  specifications  can  be  evaluated  by  comparing  the  predictability  or 

the  stability  of  the  model.  The  more  stable  result  will  be  obtained 

from  the  better  model  specification.  The  statistic  for  the  stability 

2 

test  is  distributed  as  \ with  degree  of  freedom  equal  to  NT-N^T,  where 
NT  is  the  number  of  observations  in  the  whole  sample  and  N^T  is  the 
number  of  observations  in  the  subsample.  Since  all  the  degrees  of 
freedom  for  the  test  in  this  study  are  greater  than  100,  the 
standardized  normal  statistics  are  calculated  and  reported.  A smaller 


72 


value  of  the  statistic  indicates  that  the  parameter  estimates  are  more 

stable,  thus  the  model  is  better  specified  than  the  other  models,  which 

produce  greater  value  of  the  stability  test  statistic.5  The  parameter 

stability  tests  for  the  models  with  various  error  covariance  structure 

show  that  the  model  Tobit3  is  relatively  more  stable:  The  null 

hypothesis  of  stable  parameters  cannot  be  rejected  for  4 industries, 

while  the  null  hypothesis  cannot  be  rejected  for  only  2 industries  with 

models  Tobitl  and  Tobit2.  Moreover,  the  statistics  from  the  model 

Tobit3  are  smaller  than  those  from  the  other  models  in  most  industries. 

Therefore,  it  can  be  said  that  the  model  Tobit3  is  more  stable  than  the 

other  models.  We  can  use  the  results  to  support  the  argument  that  the 

error  covariance  structure,  which  is  assumed  in  the  model  Tobit3,  is 

better  than  that  of  the  other  error  covariance  specifications. 

2 

The  conventional  R 's,  which  indicate  the  explanatory  power  of  the 

model,  could  be  a supplementary  criterion  for  evaluating  the 

2 

specifications.  The  R 's  for  the  models  Tobitl  and  Tobit3  are  not  much 

2 

different,  while  the  R 's  of  the  model  Tobit2  have  relatively  small 
values.  That  is,  the  explanatory  power  of  the  model  Tobit2  is  not  as 
good  as  that  of  the  models  Tobitl  and  Tobit3. 

From  these  results,  we  can  draw  the  conclusion  that  the  model 
Tobit3,  which  assumes  the  heteroscedatic  error  covariance  structure  of 
(4.2c),  is  a better  specification  than  the  fixed  effects  tobit  model  or 
model  Tobit2  with  heteroscedastic  error  covariance  structure  of  (4.2b). 


5 The  critical  value  of  the  standard  normal  statistic  at  95% 
significant  level  is  1.96.  A statistic  greater  than  the  critical  value 
rejects  the  null  hypothesis  that  the  parameters  are  stable. 


73 


The  estimation  results  of  Tobit3  can  now  be  used  to  interpret  the 
dividend  behavior  of  the  U.S.  manufacturing  industry.  The  lagged 
dividend  variable  and  the  earnings  variable  significantly  explain  the 
dividend  behavior.  The  only  difference  between  this  and  the  original 
Lintner  model  is  the  significance  of  the  constant  term.  Lintner 
includes  a constant  term  to  reflect  the  different  reluctance  between 
reducing  and  raising  dividends.  He  found  out  that  the  constant  term  is 
significantly  positive  from  the  empirical  study.  The  constant  term 
included  in  the  model  Tobit3  has  a slightly  different  meaning.  The 
constant  term  which  is  common  across  all  the  firms  can  be  interpreted  as 
the  average  of  the  firm  specific  effects  or  the  intercept  for  the 
industry.6  From  the  result  that  the  constant  terms  for  all  the 
industries  are  not  significantly  different  from  zero,  we  know  that  the 
average  of  the  firm  specific  effects  is  zero  and  so  there  is  no  evidence 
of  a reluctance  to  cut  dividends. 

The  signaling  hypothesis  and  the  agency  cost  hypothesis  can  be 
examined  using  the  estimation  results.  The  growth  rate  and  debt-equity 
ratio  are  included  in  the  regression  as  proxy  variables  which  indicates 
the  firm's  state.  Following  the  signaling  hypothesis,  firms  in  a good 
state  will  pay  more  dividends  than  firms  in  a bad  state.  However,  the 
estimation  results  do  not  support  the  hypothesis:  the  hypothesis  of  a 
zero  coefficient  for  the  growth  rate  variable  cannot  be  rejected  for  any 
of  the  industries,  and  insignificance  of  the  debt-equity  ratio 


6 If  we  interpret  the  constant  term  as  the  average  of  the  firm 
specific  effects,  it  is  expected  to  be  zero  under  the  random  effects 
specification  where  E(a^)=0. 


74 


coefficient  rejected  in  only  two  industries.  The  number  of  common 
shareholders  should  be  related  to  the  level  of  dividend  payout  following 
the  agency  cost.  However,  the  estimation  results  show  no  evidence 
supporting  this  hypothesis:  the  insignificance  of  this  variable  is 
rejected  in  only  two  industries. 


75 


TABLE  4.1  STRUCTURE  OF  THE  DATA  SETS 


DNUM 

firms 

obs 

% of  zero 
observations 

Industry 

2000-2099 

47 

517 

14.31 

Food  and  kindred  product 

2200-2300 

47 

517 

40.81 

Textile  mill  product  and 
Apparel  industry 

2600-2771 

57 

627 

11.64 

Paper  and  allied  product, 
Printing  and  publishing 

2800-2890 

81 

891 

6.85 

Chemical  and  allied  product 

2911-3079 

63 

693 

18.76 

Petroleum,  Rubber,  Plastic 

3310-3499 

80 

880 

18.98 

Metal  product 

3510-3590 

77 

847 

20.43 

Machinery 

3600-3679 

107 

1177 

31.76 

Electrical  machinery 

3680-3699 

37 

407 

58.23 

Computing  machinery 

3711-3790 

53 

583 

22.30 

Motor  vehicle,  Aircraft  and 
Transportation  equipment 

DNUM  is  the  industry  classification  number  in  the  Industrial 
Compustat  tape 


76 


TABLE  4.2  ESTIMATION  AND  TEST  RESULTS  OF  FOOD  AND 
KINDRED  PRODUCT  INDUSTRY 


Variable 

LSDV 

GLS 

Tobitl 

Tobit2 

Tobit3 

LDPS 

.4176 

(11.916) 

.5471 

(19.135) 

.7116 

(26.692) 

.6582 

(19.715) 

.6257 

(16.307) 

EPS 

.1280 

(15.193) 

.1193 

(15.202) 

.1090 

(14.270) 

.0983 

(13.599) 

.1089 

(10.984) 

GROWTH 

-.1481 

(-2.044) 

- .1551 
(-2.170) 

- .1747 
(-2.410) 

-.1032 
( -.700) 

-.1541 

(-1.051) 

DER 

.0257 
( -315) 

- .0684 
(-1.183) 

.0006 
( .186) 

-.0878 

(-1.491) 

-.0300 
( -.515) 

CSH 

.0900 
( 1.358) 

.1051 
( 3.175) 

.0847 
( 6.463) 

.0910 
( 4.121) 

.1012 
( 3.566) 

ONE 

- .0548 
( -.948) 

- .2256 
(-7.384) 

-.1812 

(-3.332) 

-.1563 

(-2.667) 

R2 

.9061 

.9296 

.9568 

.8517 

.9139 

-logL 

143.30 

118.57 

78.871 

LM 

12.792 

STABILITY 

-1.198 

-3.104 

.204 

X2  (43) 

113.83 

108.46 

5.953 

The  values 

in  the  parentheses  under 

coefficient 

estimates 

are 

^t-statistics . 

The  stability  test  statistics  are  standardized  normal. 

c 2 

The  number  in  the  parenthesis  of  x is  the  degree  of  freedom. 


77 


TABLE  4.3  ESTIMATION  AND  TEST  RESULTS  OF  TEXTILE  MILL, 
AND  APPAREL  INDUSTRY 


Variable 

LSDV 

GLS 

Tobitl 

Tobit2 

Tobit3 

LDPS 

.5547 

(12.123) 

.8097 

(33.850) 

.9425 

(39.713) 

.9287 

(41.373) 

.8918 

(23.228) 

EPS 

.0507 
( 6.808) 

.0488 
( 8.729) 

.0565 
( 9.248) 

.0588 
( 9.152) 

.0435 
( 4.977) 

GROWTH 

- .0490 
( -.936) 

- .0010 
( -.022) 

-.0147 
( -.498) 

- .0389 
( -.965) 

.0140 
( .265) 

DER 

.0920 
( 1.640) 

.0031 
( .089) 

- .0582 
(-2.833) 

-.0729 

(-1.797) 

-.0477 
( -.833) 

CSH 

.0141 
( -395) 

.0345 
( 2.731) 

.0115 
( 1.010) 

.0163 
( .1721) 

.0101 
( .407) 

ONE 

- .0156 
( -.793) 

- .1012 
(-6.090) 

-.1061 

(-4.633) 

- .0103 
( -.243) 

R2 

.9319 

.9042 

.9581 

.9343 

.9609 

-logL 

9.448 

29.229 

16.044 

LM 

.0105 

STABILITY 

-1.354 

-1.238 

-1.186 

X2(55) 

135.50 

76.50 

31.21 

aCf.  Table  2. 


78 


TABLE  4.4  ESTIMATION  AND  TEST  RESULTS  OF  PAPER, 


PRINTING  AND 

PUBLISHING 

INDUSTRY 

Variable 

LSDV 

GLS 

Tobitl 

Tobit2 

Tobit3 

LDPS 

.5255 

(18.902) 

.6728 

(31.312) 

.7609 

(37.201) 

.6303 

(26.091) 

.6732 

(17.608) 

EPS 

.1157 

(14.724) 

.1028 

(14.565) 

.1017 

(14.571) 

.1147 
( 8.372) 

.1084 
( 6.941) 

GROWTH 

- .3106 
(-3.989) 

- .2303 
(-3.241) 

- . 1440 
(-2.078) 

-.2280 

(-2.560) 

- .2068 
(-1.316) 

DER 

.1627 
( 3.015) 

-.0013 
( -.040) 

.0098 
( 1-155) 

- .0671 
(-2.530) 

-.0295 
( -.538) 

CSH 

.0013 
( .037) 

.0677 
( 5.465) 

.0599 
( 6.043) 

.0440 
( 2.528) 

.0532 
( 1.188) 

ONE 

-.0563 

(-1.735) 

-.1647 

(-6.882) 

-.0548 

(-1.755) 

-.0650 

(-1.017) 

R2 

.9129 

.9533 

.9669 

.8422 

.9204 

-logL 

24.408 

41.872 

13.872 

LM 

17.804 

STABILITY 

4.431 

6.282 

3.371 

X2(55) 

237.27 

128.53 

86.68 

aCf.  Table  2. 


79 


TABLE  4.5  ESTIMATION  AND  TEST  RESULTS  OF 
CHEMICAL  INDUSTRY 


Variable 

LSDV 

GLS 

Tobitl 

Tobit2 

Tobit3 

LDPS 

.6099 

(25.108) 

.7795 

(48.757) 

.7911 

(47.425) 

.7991 

(61.471) 

.7987 

(41.546) 

EPS 

.0737 

(12.055) 

.0708 

(13.064) 

.0750 

(13.085) 

.0687 

(25.398) 

.0646 

(15.488) 

GROWTH 

-.2531 

(-3.553) 

-.2186 

(-3.239) 

- .2380 
(-3.430) 

-.1747 

(-3.443) 

- .1662 
(-1.433) 

DER 

.0354 
( .558) 

- .0348 
(-1.005) 

- .0671 
(-2.375) 

-.0664 

(-2.009) 

-.0468 
( -.714) 

CSH 

-.1088 

(-2.226) 

.0415 
( 4.945) 

.0505 
( 5.913) 

.0387 
( 3.926) 

.0357 
( 1.151) 

ONE 

.0239 
( 1.043) 

- .0313 
(-1.420) 

-.0199 
( -.771) 

.0038 
( .081) 

R2 

.9161 

.9649 

.9635 

.9305 

.9355 

-InL 

145.39 

64.958 

26.443 

LM 

.2618 

STABILITY 

9.678 

5.040 

4.554 

/-N 

00 

r^» 

V-X 

CM 

X 

127.76 

69.35 

11.32 

aCf.  Table  2. 


80 


TABLE  4.6  ESTIMATION  AND  TEST  RESULTS  OF 
PETROLEUM  AND  PLASTIC  INDUSTRY 


Variable 

LSDV 

GLS 

Tobitl 

Tobit2 

Tobit3 

LDPS 

.5384 

(15.929) 

.7339 

(32.090) 

.8120 

(37.405) 

.7947 

(36.085) 

.8042 

(27.787) 

EPS 

.0379 
( 7.116) 

.0447 
( 9.337) 

.0537 

(11.667) 

.0413 
( 8.602) 

.0431 
( 6.866) 

GROWTH 

-.1265 

(-1.820) 

- .1426 
(-2.233) 

- .1700 
(-2.974) 

-.1119 

(-1.581) 

-.1141 
( -.914) 

DER 

-.0514 

(-1.190) 

- .0328 
(-1.030) 

- .0465 
(-2.356) 

- .0919 
(-3.854) 

- .1225 
(-6.448) 

CSH 

.0328 
( .525) 

.0650 
( 5.818) 

.0537 
( 5.379) 

.0586 
( 4.683) 

.0472 
( 3.064) 

ONE 

.0517 
( 1.747) 

-.0728 

(-3.155) 

-.0315 
( -.998) 

.0245 
( .577) 

R2 

.9133 

.9517 

.9578 

.9103 

.9350 

-InL 

198.16 

82.613 

64.136 

LM 

.0814 

STABILITY 

5.676 

7.990 

-3.060 

x2<54) 

69.14 

55.21 

49.47 

aCf.  Table  2. 


81 


TABLE  4.7  ESTIMATION  AND  TEST  RESULTS  OF 
METAL  PRODUCT  INDUSTRY 


Variable 

LSDV 

GLS 

Tobitl 

Tobit2 

Tobit3 

LDPS 

.5724 

(21.030) 

.7610 

(41.444) 

.8870 

(49.663) 

.8310 

(61.699) 

.7896 

(36.993) 

EPS 

.0533 

(13.020) 

.0486 

(13.161) 

.0471 

(12.289) 

.0508 

(15.491) 

.0518 

(13.192) 

GROWTH 

-.0702 

(-1.550) 

- .0521 
(-1.230) 

.0062 
( -158) 

-.0223 
( -.534) 

-.0347 
( -.597) 

DER 

.0083 
( .312) 

- .0007 
( -.033) 

- .0313 
(-2.137) 

- .0638 
(-2.771) 

-.0314 

(-1.035) 

CSH 

-.0133 
( -.288) 

.0391 
( 3.947) 

.0185 
( 1.922) 

.0137 
( 1.259) 

.0247 
( 1.759) 

ONE 

.0505 
( 2.314) 

- .0742 
(-3.717) 

-.0299 

(-1.259) 

.0133 
( .330) 

R2 

.8615 

.9375 

.9361 

.8723 

.9039 

- InL 

235.86 

76.258 

53.045 

LM 

1.268 

STABILITY 

6.316 

8.085 

9.440 

X2  (72) 

74.37 

33.18 

13.90 

aCf.  Table  2. 


82 


TABLE  4.8  ESTIMATION  AND  TEST  RESULTS  OF 
MACHINERY  INDUSTRY 


Variable 

LSDV 

GLS 

Tobitl 

Tobit2 

Tobit3 

LDPS 

.5468 

(18.927) 

.6987 

(32.385) 

.8859 

(58.785) 

.9057 

(62.185) 

.8235 

(27.121) 

EPS 

.0485 

(11.644) 

.0354 

(15.662) 

.0560 

(14.735) 

.0515 

(14.616) 

.0503 
( 7.126) 

GROWTH 

-.1175 

(-2.430) 

-.0618 

(-2.186) 

- .0674 
(-1.661) 

-.0888 

(-3.045) 

-.1115 

(-1.217) 

DER 

.0115 
( .352) 

- .0189 
(-1.043) 

-.0120 

(-1.004) 

-.0216 

(-1.710) 

-.0087 
( -.408) 

CSH 

.0365 
( .881) 

.0743 
( -995) 

.0286 
( 3.292) 

-.0034 
( -.396) 

.0299 
( 1.817) 

ONE 

.0654 
( 1.207) 

- .1068 
(-5.748) 

-.0639 

(-3.383) 

-.0181 
( -.434) 

R2 

.9197 

.9447 

.9575 

.9047 

.9219 

-InL 

126.59 

30.691 

10.750 

LM 

8.126 

STABILITY 

-8.445 

3.356 

3.140 

X2(69) 

190.31 

159.30 

25.63 

aCf.  Table  2. 


83 


TABLE  4.9  ESTIMATION  AND  TEST  RESULTS  OF 


ELECTRICAL 

MACHINERY 

INDUSTRY 

Variable 

LSDV 

GLS 

Tobitl 

Tobit2 

Tobit3 

LDPS 

.5879 

(25.276) 

.8189 

(55.424) 

.9246 

(64.042) 

.8890 

(61.722) 

.7933 

(42.353) 

EPS 

.0575 

(11.377) 

.0525 

(12.607) 

.0546 

(12.994) 

.0632 

(14.045) 

.0703 

(11.724) 

GROWTH 

-.0840 

(-2.242) 

- .0630 
(-1.836) 

-.0621 

(-2.045) 

- .0732 
(-1.587) 

- .0970 
(-1.320) 

DER 

.0307 
( 1.044) 

-.0148 
( -.760) 

- .0426 
(-3.047) 

- .0977 
(-4.786) 

- .0863 
(-2.952) 

CSH 

- .0088 
( -.298) 

.0164 
( 2.656) 

.0119 
( 2.086) 

.0086 
( 1.106) 

.0085 
( -753) 

ONE 

.0151 
( 1.098) 

- .1081 
(-9.209) 

-.1027 

(-5.620) 

- .0117 
( -.410) 

R2 

.9391 

.9514 

.9624 

.9177 

.9554 

- InL 

63.783 

50.702 

37.042 

LM 

10.183 

STABILITY 

-2.818 

-4.474 

-2.184 

m 

00 

CM 

X 

416.86 

155.58 

9.177 

aCf.  Table  2. 


84 


TABLE  4. 

.10  ESTIMATION 
COMPUTING 

AND  TEST 
MACHINERY 

RESULTS  OF 
INDUSTRY 

Variable 

LSDV 

GLS 

Tobitl 

Tobit2 

Tobit3 

LDPS 

.4689 
( 8.537) 

.5836 

(13.272) 

.7756 

(19.541) 

.9454 

(38.536) 

.9039 

(36.053) 

EPS 

.1774 
( 8.970) 

.1692 
( 9.389) 

.1614 
( 9.369) 

.0886 
( 6.650) 

.0778 
( 5.901) 

GROWTH 

-.9308 

(-2.912) 

-.8491 

(-2.761) 

-1.0678 

(-4.818) 

-.7438 

(-2.592) 

- .5101 
(-1.722) 

DER 

.0904 
( -624) 

.0922 
( .752) 

-.1512 

(-2.113) 

-.1996 

(-3.663) 

-.1940 

(-1.645) 

CSH 

- .0256 
( -.115) 

.0278 
( -739) 

- .0165 
( -.536) 

-.0408 

(-1.294) 

- .0249 
( -.661) 

ONE 

- .0796 
( -.851) 

- .3529 
(-5.043) 

-.2373 

(-2.882) 

- .0627 
( -.622) 

R2 

.8946 

.8949 

.9113 

.9553 

.9675 

- InL 

237.94 

164.35 

129.18 

LM 

.2824 

STABILITY 

-2.782 

-1.793 

-1.928 

X2(17) 

68.99 

132.51 

19.72 

aCf.  Table  2. 


85 


TABLE  4. 

.11  ESTIMATION 
MOTOR  AND 

AND  TEST 
AIRCRAFT  : 

RESULTS  OF 
INDUSTRY 

Variable 

LSDV 

GLS 

Tobitl 

Tobit2 

Tobit3 

LDPS 

.6229 

.7622 

.9452 

.9433 

.8830 

(19.064) 

(35.529) 

(41.333) 

(47.989) 

(25.086) 

EPS 

.0656 

.0668 

.0350 

.0324 

.0344 

(12.869) 

(14.592) 

( 9.744) 

(19.700) 

( 9.195) 

GROWTH 

-.0216 

-.0251 

.0292 

.0049 

.0047 

( -.258) 

( -.315) 

( .853) 

( .169) 

( .060) 

DER 

.0899 

.0229 

-.0201 

.0152 

- .0669 

( 1.809) 

( -715) 

(-1.334) 

( 4.598) 

(-1.435) 

CSH 

.1669 

.0362 

.0084 

-.0089 

.0071 

( 2.225) 

( 3.306) 

( .679) 

( -.895) 

( .3802) 

ONE 

- .0322 

- .1384 

-.0965 

- .0001 

(-1.015) 

(-4.478) 

(-3.724) 

(- .0019) 

R2 

.8975 

.9523 

.9282 

.9176 

.9249 

-InL 

294.69 

175.05 

144.42 

LM 

.0303 

STABILITY 

2.631 

-3.261 

-1.005 

X2(47) 

165.56 

102.41 

21.55 

aCf.  Table  2. 


86 


TABLE  4.12  THE  SIGNIFICANCE  OF  THE  COEFFICIENT  ESTIMATES 
OF  THE  VARIABLES 


LDPS 

EPS 

GROWTH 

DER 

CSH 

LSDV 

0 

0 

4 

9 

8 

GLS 

0 

0 

4 

10 

2 

T0BIT1 

0 

0 

4 

4 

4 

T0BIT2 

0 

0 

6 

4 

6 

T0BIT3 

0 

0 

10 

8 

8 

The  numbers  indicate  the  number  of  data  sets  for  which  the  null 
hypothesis  of  B^=0  cannot  be  rejected  at  95%  significance  level. 


CHAPTER  5 

SUMMARY  AND  CONCLUSIONS 

This  study  consists  of  two  parts.  The  first  one  is  the 
specification  analysis  and  estimation  methods  for  the  censored  panel 
data  discussed  in  Chapter  2.  The  second  is  the  empirical  study  on  the 
dividend  behavior  of  the  U.S.  manufacturing  industry  discussed  in 
Chapter  4. 

Specifications  and  estimation  methods  for  continuous  panel  data 

and  for  the  standard  tobit  model  cannot  be  applied  to  each  other 

directly:  least  squares  methods  do  not  produce  unbiased  or  consistent 

estimates  for  censored  data,  while,  maximum  likelihood,  when  applied  to 

the  estimation  of  censored  panel  data,  has  problems  caused  by  the 

complicated  error  covariance  structure  if  the  model  assumes  random 

effects.  In  Chapter  2,  estimation  methods  for  censored  data  and  the 

specification  analysis  and  estimation  methods  for  panel  data  are 

discussed.  Then,  some  alternative  error  covariance  structures  are 

suggested  to  simplify  the  estimation  problem.  All  the  off-diagonal 

elements  of  the  error  covariance  matrix  are  assumed  to  be  zero.  And, 

the  diagonal  elements  are  assumed  to  have  heteroscedasticity . Two  error 

2 

disturbance  parameters  are  introduced  to  form  heteroscedasticities ; a 

l 

2 

and  which  characterize  the  errors  for  the  cross-sections  and  for  the 

. . 2 
time  periods  respectively.  Among  several  possible  combinations  of  a. 

2 

and  6 , the  arithmetic  mean  and  the  geometric  mean  are  used  to  model  the 


87 


88 


heteroscedasticity . However,  the  suggested  error  covariance  structures 
creates  a problem  in  estimation  because  a lot  of  parameters  must  be 
estimated.  To  solve  this  problem,  the  concentrated  maximum  likelihood 
method  is  suggested. 

The  error  covariance  structures  suggested  in  this  study  are 

2 

arbitrary  and  should  be  tested.  A \ -statistic  for  testing  the  null 
hypothesis  of  cov(u^ ,u^s)-=0  for  t^s  is  derived  from  the  formula  for  the 
truncated  normal  distribution.  The  test  can  be  used  to  examine  the 
validity  of  the  assumption  that  the  off-diagonal  elements  of  the  error 
covariance  matrix  are  zero.  Another  test  is  an  analogue  of  the  Chow- 
test  which  is  developed  to  examine  the  parameter  stability  in  the 
limited  dependent  variable  models  by  Anderson  (1987)  . Those  two 
statistics  are  proposed  as  criteria  for  evaluating  the  specifications 
suggested  in  the  study. 

In  Chapter  3,  the  theories  and  empirical  studies  in  dividend 
policy  are  surveyed.  The  principal  questions  discussed  are  "why  firms 
pay  dividends?"  and  "what  determines  the  dividend  policy?".  The  Miller 
and  Modigliani  dividend  irrelevancy  proposition,  which  is  widely 
accepted  in  the  field  of  finance,  argues  that  the  value  of  the  firm  is 
not  affected  by  the  dividend  policy  of  the  firm.  Therefore,  following 
the  M-M  proposition,  the  variables,  which  are  related  to  the  value  of 
the  firm,  have  no  relationship  with  the  dividend  policy  and  cannot  be 
used  as  explanatory  variables  in  the  regression  model  of  dividend 
behavior . 

Recent  studies  on  dividend  behavior  try  to  explain  the  reason  for 
paying  dividends  in  spite  of  costs  from  the  perspective  of  information 


89 


theory.  The  signaling  hypothesis  and  the  agency  cost  hypothesis  are 
applications  of  the  information  theories.  These  are  based  on  the  idea 
that  dividends  contain  information. 

All  the  three  hypotheses- -the  M-M  proposition,  the  signaling 
hypothesis,  and  the  agency  cost  hypothesis- -can  be  derived  from  the  same 
equation;  the  sources  and  the  uses  of  fund  identity.  The  validity  of 
the  hypotheses  is  an  empirical  question.  There  have  been  many  empirical 
studies  on  these  issues.  The  underlying  framework  in  almost  all 
empirical  studies  is  the  regression  analysis  of  the  Lintner  model.  The 
Lintner  model  formalizes  a partial  adjustment  process  of  dividend 
payouts.  According  to  the  setting  of  the  target  value,  various 
variables  are  allowed  to  the  regression.  Lintner  uses  the  current 
earnings  variable  in  place  of  the  target  value.  The  Lintner  model  has 
been  consistent  with  real  data. 

The  error  covariance  structures  and  the  estimation  methods, 
discussed  in  Chapter  2,  are  employed  in  a dividend  behavior  model  in  the 
framework  of  the  Lintner  model.  The  data  for  U.S.  manufacturing  firms 
are  obtained  from  the  Compustat  tape.  The  data  are  divided  into  ten 
subgroups  according  to  the  industry  classification  of  PSID  data. 

First  of  all,  the  fixed  effects  tobit  model  is  estimated  using  the 
maximum  likelihood  method.  Then  models  with  specific  heteroscedasticity 
in  the  error  covariance  structure  are  estimated  using  the  concentrated 
maximum  likelihood  method.  From  the  specification  tests,  the  model 
Tobit3  turns  out  to  be  most  appropriate  specification.  In  this  model, 

each  element  of  the  heteroscedastic  diagonal  in  the  error  covariance 

• 2 2 
matrix  is  assumed  to  have  a value  of  the  geometric  mean  of  a.  and  6 , 


90 


which  characterize  the  distribution  of  the  errors  for  each  cross  section 
and  time  period  respectively.  The  results  from  the  estimation  of  the 
model  Tobit3  are  used  to  interpret  the  dividend  behavior  of  the  U.S. 
manufacturing  industry. 

The  conclusions  about  the  dividend  behavior  of  the  U.S. 
manufacturing  industry  drawn  from  the  estimation  are  as  follows: 

1.  The  lagged  dividend  variable  and  the  current  earnings  variable,  which 
are  used  in  the  original  Lintner  model,  significantly  explain  the 
dividend  behavior  of  the  U.S.  manufacturing  industry.  In  other 
words,  the  Lintner  model  is  again  supported. 

2.  The  hypothesis  of  a reluctance  to  cut  dividend,  which  has  been 
reported  valid  in  several  cross-sectional  studies,  is  not  supported 
in  the  panel  study.  That  is,  there  is  no  evidence  of  any  reluctance 
to  cut  dividend  in  the  industrial  dividend  behavior. 

3.  The  infomational  hypotheses  of  the  dividend  policy  are  rejected  in 
this  empirical  study.  We  cannot  find  any  evidence  of  the  signaling 
hypothesis  from  the  estimation  results,  nor  is  the  agency  cost 
hypothesis  supported. 

4.  The  Miller  and  Modigliani  dividend  irrelevancy  proposition  is  not 
rejected  because  variables  related  to  the  value  of  the  firm  are  not 
significant  in  explaining  dividend  behavior. 


REFERENCES 


Aharony,  J.  and  I.  Swary,  "Quarterly  Dividend  and  Earnings  Announcements 
and  Stockholders'  Returns:  An  Empirical  Analysis,"  Journal  of 
Finance . 35(1980),  1-12. 

Akerlof,  G.A.,  "The  Market  for  'Lemons':  Qualitative  Uncertainty  and  the 
Market  Mechanism,"  Quarterly  Journal  of  Economics.  84(1970),  488- 
500. 

Amemiya,  T. , "Regression  Analysis  When  the  Dependent  Variable  is 
Truncated  Normal,"  Econometrica . 41(1973),  997-1016. 

Amemiya,  T. , "Tobit  Models:  A Survey,"  Journal  of  Econometrics. 

24(1984),  1-61. 

Anderson,  G.J.,  "An  Application  of  the  Tobit  Model  to  Panel  Data: 
Modelling  Dividend  Behavior  in  Canada,"  Mimeo,  Economics 
Discussion  Paper,  McMaster  University,  1986. 

Anderson,  G.J.,  "Prediction  Tests  in  Limited  Dependent  Variable  Models," 
Journal  of  Econometrics.  34(1987),  253-261. 

Anderson,  T.W.  and  C.  Hsiao,  "Formulation  and  Estimation  of  Dynamic 

Models  Using  Panel  Data,"  Journal  of  Econometrics.  18(1982),  47- 
82. 

Arabmazar,  A.  and  P.  Schmidt,  "Further  Evidence  on  the  Robustness  of  the 
Tobit  Estimator  to  Heteroskedasticity , " Journal  of  Econometrics. 
17(1981),  253-258. 

Arabmazar,  A.  and  P.  Schmidt,  "An  Investigation  of  the  Robustness  of  the 
Tobit  Estimator  to  Non-Normality,"  Econometrica . 50(1982),  1055- 
1063. 

Balestra,  P.B.  and  M.  Nerlove,  "Pooling  Cross  Section  and  Time  Series 
Data  in  the  Estimation  of  Dynamic  Model:  The  Demand  for  Natural 
Gas,"  Econometrica . 34(1966),  585-612. 

Berndt,  E.R.,B.H.  Hall,  R.E.  Hall,  and  J.A.  Hausman,  "Estimation  and 
Inference  in  Nonlinear  Structure  Model,"  Annals  of  Economic  and 
Social  Measurement.  3(1974),  653-665. 

Bhargava,  A.  and  J.D.  Sargan,  "Estimating  Dynamic  Random  Effects  Models 
from  Panel  Data  Covering  Short  Time  Periods,"  Econometrica. 
51(1983),  1635-1659. 


91 


A 


92 


Bhattacharya,  S.,  "Imperfect  Information,  Dividend  Policy,  and  'the  Bird 
in  the  Hand'  Fallacy,"  Bell  Journal  of  Economics.  10(1979),  259- 
270. 

Black,  F.  and  M.S.  Sholes , "The  Effects  of  Dividend  Yield  and  Dividend 
Policy  on  Common  Stock  Prices  and  Returns,"  Journal  of  Financial 
Economics . 1(1974),  1-22. 

Brennan,  M.J.,  "Taxes,  Market  Valuation  and  Corporate  Financial  Policy," 
National  Tax  Journal.  23(1970),  417-427. 

Breusch,  T.S.  and  A.R.  Pagan,  "The  Lagrange  Multiplier  Test  and  Its 

Applications  to  Model  Specification  in  Econometrics,"  Review  of 
Economic  Studies.  47(1980),  239-253. 

Brittain,  J.A.,  Corporate  Dividend  Policy.  Washington  D.C.,  The 
Brookings  Institution,  1966. 

Eades,  K.M. , "Empirical  Evidence  on  Dividends  as  a Signal  of  Firm 

Value,"  Journal  of  Financial  and  Quantitative  Analysis.  17(1982), 
471-500. 

Easterbrook,  F.H. , "Two  Agency-Cost  Explanations  of  Dividends,"  American 
Economic  Review.  74(1984),  650-659. 

Fama,  E.F.,  "The  Empirical  Relationships  Between  the  Dividend  and 
Investment  Decisions  of  Firms,"  American  Economic  Review. 

64(1974),  304-318. 

Fama,  E.F.,  "Agency  Problems  and  the  Theory  of  the  Firm,"  Journal  of 
Political  Economy.  88(1980),  288-307. 

Fama,  E.F.  and  H.  Babiak,  "Dividend  Policy:  An  Empirical  Analysis," 

Journal  of  American  Statistical  Association.  63(1968),  1132-1161. 

Farrar,  D.  and  L.  Selwyn,  "Taxes,  Corporate  Financial  Policy  and  Return 
to  Investors,"  National  Tax  Journal.  29(1967),  444-454. 

Fishe,  R.P.H.,  G.S.  Maddala  and  R.P.  Trost,  "Estimation  of  a 

Heteroscedastic  Tobit  Model,"  Manuscript,  University  of  Florida, 
1979. 

Friend,  I.  and  M.  Puckett,  "Dividends  and  Stock  Prices,"  American 
Economic  Review.  54(1964),  656-682. 

Goldberger , A.S.,  "Abnormal  Selection  Bias,"  in  Studies  in  Econometrics. 
Time  Series,  and  Multivariate  Statistics,  ed.  by  S.  Karlin,  T. 
Amemiya,  and  L.A.  Goodman,  New  York,  Academic  Press,  1983. 


Hakansson,  N.H. , "To  Pay  or  Not  to  Pay  Dividend,"  Journal  of  Finance. 
37(1982),  415-428. 


93 


Hausman,  J.A.,  "Specification  Tests  in  Econometrics,"  Econometrica. 
46(1978),  1252-1272. 

Hausman,  J.A.  and  W.E.  Taylor,  "Panel  Data  and  Unobservable  Individual 
Effects,"  Econometrica.  49(1981),  1377-1398. 

Hausman,  J.A.  and  D.A.  Wise,  "Attrition  Bias  in  Experimental  and  Panel 
Data:  The  Gary  Income  Maintenance  Experiment,"  Econometrica. 
47(1979),  455-473. 

Heckman,  J.J.,  "Sample  Selection  Bias  as  a Specification  Error," 
Econometrica . 47(1979),  153-161. 

Heckman,  J.J.  and  T.E.  MaCurdy,  "A  Life  Cycle  Model  of  Female  Labour 
Supply,"  Review  of  Economic  Studies.  47(1980),  47-74. 

Higgins,  R.C.,  "The  Corporate  Dividend- Saving  Decision,"  Journal  of 
Financial  and  Quantitative  Analysis.  7(1972),  1527-1541. 

Hsiao,  C.,  Analysis  of  Panel  Data.  New  York,  Cambridge  University  Press 
1986. 

Hurd,  M. , "Estimation  in  Truncated  Samples  When  There  Is 

Heteroscedasticity , " Journal  of  Econometrics.  11(1979),  247-258. 

Jensen,  M.C.  and  W.H.  Meckling,  "Theory  of  the  Firm:  Managerial 

Behavior,  Agency  Costs  and  Ownership  Structure,"  Journal  of 
Financial  Economics.  3(1976),  305-360. 

Kalay,  A.,  "Signaling,  Information  Content,  and  the  Reluctance  to  Cut 
Dividends , " Journal  of  Financial  and  Quantitative  Analysis. 
15(1980),  855-869. 

Kao,  C. , C.F.  Lee  and  C.  Wu,  "The  Estimation  and  Tests  of  A Partial 
Adjustment  Model  of  Dividend  with  Rational  Expectations," 
Discussion  Paper  No.  25,  Syracuse  University,  1988. 

Lee,  L-F  and  G.S.  Maddala,  "The  Common  Structure  of  Tests  for 

Selectivity  Bias,  Serial  Correlation,  Heteroscedasticity  and  Non- 
Normality  in  the  Tobit  Model,"  International  Economic  Review. 
26(1985),  1-20. 

Lintner,  J.,  "Distribution  of  Incomes  of  Corporations  among  Dividends, 
Retained  Earnings,  and  Taxes,"  American  Economic  Review.  1956,  97 
113. 

Maddala,  G.S.,  "The  Use  of  Variance  Components  Models  in  Pooling  Cross 
Section  and  Time  Series  Data,"  Econometrica . 39(1971),  341-358. 

Maddala,  G.S.,  "The  Likelihood  Approach  to  Pooling  Cross-Section  and 
Time-Series  Data,"  Econometrica . 39(1971),  939-953. 


94 


Maddala,  G.S.,  "Identification  and  Estimation  Problems  in  Limited 

Dependent  Variable  Models , " in  Natural  Resources . Uncertainty  and 
General  Equilibrium  Systems:  Essays  in  Memory  of  Rafael  Luskv.  ed. 
by  A . S.  Blinder  and  P.  Friedman,  New  York,  Academic  Press,  1977. 

Maddala , G . S . , Limited  Dependent  and  Qualitative  Variables  in 

Econometrics . New  York,  Cambridge  University  Press,  1983. 

Maddala,  G.S.,  "Limited  Depent  Variable  Models  Using  Panel  Data," 

Journal  of  Human  Resources.  22(1988),  307-338. 

Maddala,  G.S.  and  F.D.  Nelson,  "Specification  Errors  in  Limited 
Dependent  Variable  Models,"  NBER  Working  Paper  No. 96,  1975. 

Marsh,  T.A.  and  R.C.  Merton,  "Dividend  Behavior  for  the  Aggregate  Stock 
Market,"  Journal  of  Business.  60(1987),  1-40. 

Miller,  M.H.  and  F.  Modigliani,  "Dividend  Policy,  Growth  and  the 

Valuation  of  Shares,"  Journal  of  Business.  34(1961),  411-433. 

Miller,  M.H.  and  K.  Rock,  "Dividend  Policy  Under  Asymmetric 
Information,"  Journal  of  Finance.  40(1985),  1031-1051. 

Miller,  M.H.  and  M.S.  Scholes,  "Dividends  and  Taxes,"  Journal  of 
Financial  Economics.  6(1978),  333-364. 

Miller,  M.H.  and  M.S.  Scholes,  "Dividends  and  Taxes:  Some  Empirical 
Evidence,"  Journal  of  Political  Economy.  90(1982),  1118-1142. 

Nakamura,  A.  and  M.  Nakamura,  "Rational  Expectations  and  the  Firm's 

Dividend  Behavior,"  Review  of  Economics  and  Statistics.  67(1985), 
606-615. 

Nelson,  F.D.,  "A  Test  for  Misspecif ication  in  the  Censored  Normal 
Model,"  Econometrica.  49(1981),  1317-1329. 

Neyman,  J.  and  E.L.  Scott,  "Consistent  Estimates  Based  on  Partially 
Consistent  Observations,"  Econometrica.  16(1948),  1-32. 

Olsen,  R.J.,  "Note  on  the  Uniqueness  of  the  Maximum  Likelihood  Estimator 
for  the  Tobit  Model,"  Econometrica . 46(1978),  1211-1215. 

Pettit,  R.R. , "Dividend  Announcements,  Security  Performance,  and  Capital 
Market  Efficiency,"  Journal  of  Finance.  27(1972),  993-1007. 

Pettit,  R.R. , "The  Impact  of  Dividends  and  Earnings  Announcements:  A 
Reconciliation,"  Journal  of  Business.  49(1976),  86-96. 

Powell,  J.L.,  "Least  Absolute  Deviations  Estimation  for  the  Censored 
Regression  Model."  Journal  of  Econometrics.  25(1984),  303-325. 


95 


Powell,  J.L.,  "Symmetrically  Trimmed  Least  Squares  Estimation  for  Tobit 
Models,"  Econometrica.  54(1986),  1435-1460. 

Riley,  J.G.,  "Competitive  Signaling,"  Journal  of  Economic  Theory. 
10(1975),  174-186. 

Robinson,  P.M. , "On  the  Asymptotic  Properties  of  Estimators  of  Models 
Containing  Limited  Dependent  Variables,"  Econometrica . 50(1982), 
27-41. 

Rosett,  R. , "A  Statistical  Model  of  Friction  in  Economics," 

Econometrica . 27(1959),  263-267. 

Ross,  S.A.,  "The  Determination  of  Financial  Structure:  The  Incentive- 
Signalling  Approach,"  Bell  Journal  of  Economics.  8(1977),  23-40. 

Rozeff,  M.S.,  "Growth,  Beta  and  Agency  Costs  as  Determinants  of  Dividend 
Payout  Ratios,"  Journal  of  Financial  Research.  5(1982),  249-259. 

Spence,  M. , "Job  Market  Signaling,"  Quarterly  Journal  of  Economics. 
87(1973),  355-374. 

Tobin,  J.,  "Estimation  of  Relationships  for  Limited  Dependent 
Variables,"  Econometrica.  26(1958),  24-36. 

Watts,  R.L.,  "The  Information  Content  of  Dividends,"  Journal  of 
Business . 46(1973),  191-211. 

Watts,  R.L. , "Comments  on  'The  Impact  of  Dividend  and  Earnings 

Announcements:  A Reconciliation',"  Journal  of  Business.  49(1976), 
97-106. 


BIOGRAPHICAL  SKETCH 


Byeong  Soo  Kim  was  born  in  Taejon,  Korea,  in  1958.  He  started  his 
study  of  economics  at  Yonsei  University  in  1977.  After  three  years  of 
military  service,  he  received  an  bachelor's  degree  in  economics  in  1983. 
He  obtained  an  M.A.  degree  in  economics  from  the  Graduate  School  of 
Yonsei  University  in  1985.  Then  he  entered  the  Ph.D.  program  in 
economics  at  the  University  of  Florida  in  1985,  and  expects  to  receive  a 
Doctor  of  Philosophy  degree  in  December,  1989. 


96 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it 
conforms  to  acceptable  standards  of  scholarly  presentation  and  is  fully 
adequate,  in  scope  and  quality,  as  a dissertation  for  the  degree  of 
Doctor  of  Philosophy. 


G.  S.  Maddala,  Chairman 

Graduate  Research  Professor  of  Economics 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it 
conforms  to  acceptable  standards  of  scholarly  presentation  and  is  fully 
adequate,  in  scope  and  quality,  as  a dissertation  for  the  degree  of 
Doctor  of  Philosophy. 

ru.n-^ 

Mark  Rush 

Associate  Professor  of  Economics 


A 


I certify  that  I 
conforms  to  acceptable 
adequate,  in  scope  and 
Doctor  of  Philosophy. 


have  read  this  study  and  that  in  my  opinion  it 
standards  of  scholarly  presentation  and  is  fully 
quality,  as  a dissertation  for  the  degree  of 


&<S*  \ <X. 


Leonard  Cheng 
Associate  Professor  of  Economics 


I certify  that  I 
conforms  to  acceptable 
adequate , in  scope  and 
Doctor  of  Philosophy. 


have  read  this  study  and  that  in  my  opinion  it 
standards  of  scholarly  presentation  and  is  fully 
quality,  as  a dissertation  for  the  degree  of 


sLL  i 


J.  S.  ~3honkwiler 

Professor  of  Food  and  Resource 

Economics 


This  dissertation  was  submitted  to  the  Graduate  Faculty  of  the 
Department  of  Economics  in  the  College  of  Business  Administration  and  to 
the  Graduate  School  and  was  accepted  as  a partial  fulfillment  of  the 
requirements  for  the  degree  of  Doctor  of  Philosophy. 


December  1989 


Dean,  Graduate  School 


